ABAP
Recommendations
Table of
Contents:
SELECT Statements
The most
frequently used method of extracting data from the database is by using the SELECT
statement which makes it important that this is correctly written, the database
optimizer selects and uses database indexes depending partly on the WHERE
conditions specified in the SELECT statement. The order of the WHERE conditions
in the SQL statement is not important.
Select
Statement Recommendations
- Try to
avoid using SELECT…ENDSELECT loops since they are almost always less
efficient than set processing using a SELECT statement.
- It’s
usually faster to use INTO TABLE rather than APPENDING table if the
receiving internal table is empty.
- Use a SELECT
list instead of SELECT * if you are only interested in specific columns of
the table, the network load is considerably less.
- Use
explicit field lists rather than INTO CORRESPONDING FIELDS OF since it is
more efficient and easier for maintenance to see which fields are moved
where.
- Use
aggregate functions (COUNT, SUM, MAX, MIN, and AVG) to perform
calculations if they significantly reduce the quantity of data records to
be transferred, they are usually used with a GROUP BY clause.
- To
apply a logical condition to the groups defined in the GROUP BY clause,
use HAVING. The data quantity to be transferred depends on the selectivity
of the HAVING clause. Using HAVING
clauses is similar to using aggregate functions; they create an extra load
on the database so only use them if they reduce the quantity of data
records to be transferred. Before you use a HAVING clause, check whether
the condition can be specified in the WHERE clause instead.
- If you
are interested if there exists at least one row of a database table or
view with a certain condition, use SELECT ... UP TO 1 ROWS. If all primary key fields are supplied
in the WHERE condition you can also use SELECT SINGLE.
- Always
specify your conditions in the WHERE-clause instead of checking them
yourself with CHECK statements. The
database system is then given the opportunity to use an index (if
possible) and the network load is considerably less.
- Try to
use the full primary index in the WHERE clause of SELECT statements for
optimum performance. For Outbound
BADI processing, this may involve pulling the LOGSYS field into the field
list of an extract so that it can be used in the BADI.
- Formulate
WHERE statements positively wherever possible. If fields in the WHERE clause are
specified with operators NOT or <>, these conditions cannot be used
for a search over a database index. If a positive formulation cannot be used,
for example because the IN list would be too long, you should still specify
the WHERE condition with NOT to reduce the amount of data to be
transferred. Use WHERE with BETWEEN
and WHERE with LIKE, >, or <
and WHERE with OR only if
necessary.
- A field
specified with LIKE can narrow the selection of database indexes and be
used for a search over a database index only if it does not begin with the
wildcard character _ or %. In ABAP, the wildcard + is used for any
character, and * is used for any character string. For database accesses,
however, the characters _ and % are used.
- If you
want the results of a SELECT statement to be returned in the sorted order
of the table index, use an ORDER BY clause on the database. Otherwise use SORT <internal table>
in the program.
- RANGES
tables can be defined using the ABAP statements SELECT-OPTIONS and
RANGES. They implicitly define an
internal table with the SIGN, OPTION, LOW, and HIGH fields. The RANGES
table can be populated by the end user at the selection screen dynamically
or by the program. The database
interface converts the rows in a RANGES table so that they can be read by
the DBMS, and links them with OR. Since the RANGES table can contain rows
with complex expressions (e.g. BT, CP). The SQL statement that results can
be complex and difficult for the DBMS to process so USE WITH CAUTION.
Database Joins
In
ABAP, there are various ways to implement the Join operator:-
v
Nested SELECTs – These
have several disadvantages and should be avoided:
-
Data records of the inner table are
transferred row-by-row to the driving table
-
Lots of small fetches, instead of fewer and
more compact fetches.
-
The data records of the inner table are read
multiple times.
v
SELECT FOR ALL ENTRIES - Enables
you to create a join between an internal table and a database table, when using
this, the following problems can occur:-
-
If the internal table is empty selection is
not limited and the WHERE clause conditions are not evaluated so the whole Database
table is likely to be selected.
-
Duplicate entries should be deleted from the
internal table before executing the SELECT... FOR ALL ENTRIES, otherwise,
identical data is read unnecessarily from the database.
v
DB VIEWS – These are defined in the ABAP
dictionary and activated to create them in the database. DB views created by
other developers can also be used (reusability).
v
ABAP JOINs - These are defined in an ABAP
program. To read data distributed over
various tables, you must create a Join between tables using the JOIN operator.
-
An INNER JOIN produces a result set that
contains only the records that match in both sides of the Join.
-
A LEFT or OUTER JOIN produces a resulting set
containing all data records from the outer table regardless of whether matching
records exist in the inner table, if there are no matching data records in the
inner table, NULL values are returned for the fields in the inner table.
Join Recommendations
1.
Instead of a SELECT into one internal table
and LOOP AT …. READ to check the contents of a second internal table, use DataBase
Join between the two tables wherever possible.
2.
If using a DataBase Join ensure that it uses
indexes on both sides of the join.
3.
If it’s possible to select all data from
database tables then the ABAP JOIN method is preferable, if it’s necessary to
select data from the database based on the contents of an internal table then
the FOR ALL ENTRIES must be used. One
significant difference between these two methods is that FOR ALL ENTRIES will
not reflect duplicates in the internal table, so duplicates in the itab
matching a single row in the database table will return a single record. A database JOIN will reflect duplicates in
both sides of the.
4.
If the data is to be selected from more than
2 database tables then it may be efficient to write a multi-way join or better
to break it down into several steps, this depends on the circumstances so some
trial and error will be necessary.
Other Statements
Recommendations
1.
Avoid using MOVE-CORRESPONDING since it can
be more expensive in processing than explicit MOVE statements.
2.
Avoid unnecessary MOVEs by using the
following explicit work area operations where appropriate:-
APPEND wa TO itab.
INSERT wa INTO itab.
COLLECT wa INTO itab.
MODIFY itab FROM wa.
READ TABLE itab INTO wa.
LOOP AT itab INTO wa.
INSERT wa INTO itab.
COLLECT wa INTO itab.
MODIFY itab FROM wa.
READ TABLE itab INTO wa.
LOOP AT itab INTO wa.
5.
LOOP ... WHERE is faster than LOOP/CHECK
because LOOP ... WHERE evaluates the specified condition internally.
6.
The performance of logical expressions is
better if the operands being compared have the same data type.
7.
CASE statements are clearer and a little
faster than IF-constructs.
8.
If you can use WHILE instead of a
DO+EXIT-construction, then do so. WHILE
is easier to understand and faster to execute.
Internal Tables
The
rows of internal tables can be identified either via the work area from which
the key values are taken (in an implicit key access); or via an explicit key
specification using WITH [TABLE] KEY. Key accesses are possible for all table
types, but they show different levels of
performance
for each table type.
There
are three types of internal tables:
v
In a STANDARD table, you can access data
using either the table index or the key.
If access is the primary key, the response time is linearly related to
the number of table entries. The key of a standard table is always non-unique.
v
SORTED tables are always stored in ascending
order according to their key. You can access them using either the table index
or the key. If you use the key the response time is in logarithmic relation to
the number of table entries because the system uses a binary search to access
the table. The key can be either UNIQUE or NON-UNIQUE. STANDARD tables and SORTED tables are
generically known as index tables, since their values can be accessed via an
index.
v
HASHED tables can only be accessed via the
primary key. The response time is constant, regardless of the number of table entries,
because access proceeds via a hash algorithm which is determined internally so
they are useful for reading a single data record via the key. The key of a
HASHED table must be unique and you cannot access hash tables through indexes.
Internal
Table Recommendations
1.
INSERT for a standard table or hashed table
using the key has the same effect as an APPEND statement, for hashed tables,
however, the address of the data row to be entered must also be
calculated. For inserts in SORTED
tables, the sort sequence must be strictly observed, which means the access
costs grow with increasing table size.
2.
For a STANDARD table, it is generally best
performance to sort according to the relevant fields and subsequently execute a
READ... BINARY SEARCH.
3.
With the COLLECT statement, the contents of
the work area is added to an entry with the same key or added to the table as a
new entry. This allows you to create aggregated internal tables, so reducing
data volume.
4.
If you need the COLLECT semantics, DO use
COLLECT. When using READ BINARY the
internal table's index must be adjusted with each INSERT. COLLECT, however, uses a hash algorithm and
is therefore independent of the number of entries and does not need to maintain
a table index. If you need the final data sorted, sort it after all data has
been collected. If the amount of data is
small, the READ/INSERT approach isn't bad, but for large amounts of data (>
1000), COLLECT is much faster.
5.
Do not use COLLECT to populate an internal
table in combination with any other table-filling statements (APPEND, INSERT,
MODIFY, SELECT * INTO TABLE and/or SELECT * APPENDING TABLE) because then the COLLECT
cannot use its hash algorithm but resorts to a normal linear search, which is
dramatically slower.
6.
Use APPEND LINES OF instead of LOOP AT…APPEND…ENDLOOP, if possible because the former
statement is far more efficient.
7.
Mass data processing should not be used if
the internal table has a deep row structure, that is, contains nested tables.
In this case, you require single data record processing with SELECT ...
ENDSELECT.
8.
The internal table should have a structure
similar to the field list, so that it is unnecessary to use INTO CORRESPONDING
FIELDS (which must perform the additional work of comparing the field
names).
9.
If a target table is of type SORTED, it
automatically sorts the inserted data. If APPENDING is used, the new data
records are added in the sorted order.
10. If an internal
table key is PART of the primary key of the database table, it cannot be
defined as UNIQUE because duplicates may occur. These duplicates cause runtime
errors.
11. If you are populating
an internal table from an internal table, use mass processing instructions APPEND/INSERT
LINES. They can be used to extend an
existing internal table, you can copy an interval of lines from one internal
table to another using a FROM...TO.
12. If you want
all key fields of the database table to appear in the internal table, but not
in the key, you cannot use a SELECT DISTINCT.
Instead, this can be optimally realized using DELETE ADJACENT DUPLICATES
FROM This checks whether adjacent rows have the same key. The internal table
should be sorted. COMPARING lets you
define duplicates in terms of other keys.
Index Considerations
An
index should not consist of too many fields. Having a few very selective fields
increases the chance of reusability, and reduces the chance of the database
optimizer selecting an unsuitable access path.
There
are several types of Index Scan
v
INDEX
UNIQUE SCAN: The index selected is unique (primary index or unique secondary index) and
fully specified. This type of access is very effective and will be used if all
fields in the index are specified with an equal (=) sign in the WHERE
clause. If a further WHERE condition is
specified in the SELECT statement for a field that is not in the index, this is
evaluated after the table records had been read from the table blocks.
v
INDEX RANGE SCAN: The index selected is unique or
non-unique. For a non-unique index this means that not all index fields are
specified in the WHERE clause and a range of the index is read and checked. An index
range scan may not be as effective as a full table scan. Outdated update statistics, database
optimizer errors, missing indexes, or inappropriate ABAP coding can all mean
that the database optimizer may select a completely unsuitable index and then
perform an unselective index range scan.
v
FULL
TABLE SCAN: The whole table is read sequentially. Each table block is read
once. Since no index is used, no index
blocks are read. The full table scan
access strategy is very effective if a large part of a table (for example, 5%
of all table records) needs to be read.
Index
Recommendations
1.
Create as few indexes as possible per table
(as few as possible but as many as necessary).
2.
Selection of index fields
-
As few fields as possible in index (as a
general rule – 4 fields)
-
Fields that are as selective as possible
-
Selective fields as near to the beginning as
possible
3.
Do not change SAP standard indexes
Slow SELECT Statement Checklist
If
a SELECT statement is running slowly consider the following:-
v
Changing the WHERE clause to use an existing
index on the table
v
Adding a secondary index to the table to
fulfill the needs of the selection
v
Alter the WHERE clause to reduce the number
of rows returned by limiting the selection or by aggregate functions
v
Reduce the columns to be transferred by
formulating a suitable field list.
Slow LOOP Checklist
If a LOOP is taking a long time consider the following:
v
If
internal tables being READ inside the loop are HASHED consider changing them to
STANDARD, if they are STANDARD consider changing them to HASHED.
v
Explore
the possibility of taking complex calculations out of the loop.
v
Populating
some of the internal tables in one go using a SELECT statement with a JOIN
before the loop so that fewer or no READs of internal tables are needed in the
loop.
No comments:
Post a Comment