Oracle Concepts Course - Part 2

Courtesy: Dizwell

The Structure of the Oracle Instance

An Oracle instance is merely an allocation of memory in which to do work plus a bunch of background processes which automate a lot of that work for us. The memory areas are collectively known as the System Global Area (or SGA for short). The constituent 'pools' of memory, which in the aggregate make up an SGA, are called The Shared Pool, The Buffer Cache and The Log Buffer. These three SGA components are compulsory and every SGA on the planet will have all three of them. There are various additional 'pools' which your SGA might or might not have configured, depending on how you are using your database, what software options you've enabled (and paid for!) and so on. You will commonly see, for example, a Java Pool and a Large Pool. You will less commonly see (in 10g and above) a Streams Pool.
 
No matter what fancy names these pools of memory are given, they are all merely chunks of memory in which different sorts of data are stored. The principle function of all of them is to try and 'cache' that data in memory so that you do not have to fetch it back off disk or work it out from scratch (both of which options are relatively slow and expensive).

The Shared Pool

When you issue a query such as select * from emp, the database must work out what that means. Do we have a table called EMP? What are its constituent columns? Where, physically, on disk, does the table live? Are there any indexes built on the table which might help us get the results back more quickly? Do you, the user, have rights to query the table? The process of answering all these sorts of questions is known as parsing a SQL statement, and it involves a lot of work going on 'behind the scenes'.

To work out whether a table exists at all, for example, we have to query an in-built system table called TAB$. To work out where it physically exists on disk, we have to query more in-built system tables including one called TS$ . To check if you have the rights to view the table, yet another query is made of a table called USER$ (amongst others). You issue a simple select statement, in other words, and the poor database has to issue a dozen queries of assorted system tables before it can even work out what your statement means. We call these 'System SQL statements issued in response to a user SQL statement' recursive SQL, and performing recursive SQL means performing lots of preparatory work before actually answering your SQL requests.

Parsing is therefore very expensive... so we'd really like to only have to do it once and have the results of the parse stored for you and other users to make re-use of every time you re-issue the same basic SQL statement in the future. The result of a parse is known as an execution plan, and it represents a package of pre-worked-out instructions as to how to answer a SQL statement. Each new SQL statement causes a new execution plan to be constructed and stored... and they are stored in a sub-section of the Shared Pool known as the Library Cache. If a second user issues exactly the same SQL statement as someone else happened to issue earlier, therefore, the second user can simply nip into the Library Cache and make use of the execution plan previously worked out for the first user.

If you have to go through the entire rigmarole of working out an execution plan from scratch, that's called 'doing a hard parse'. If you can merely re-use execution plans previously created by other users, that's called 'doing a soft parse', and soft parses are a lot cheaper and quicker to carry out than hard ones.

You can fairly say, therefore, that the job of the Library Cache is to help stop us having to do hard parses.

Incidentally, fetching the contents of system tables like TAB$, USER$ and so on is expensive, too: these tables are collectively known as the data dictionary, and the data dictionary is physically stored on disk in the system datafile. Forever having to access these tables off disk would not be a good idea: lots of disk access generally means "slow performance". Therefore, once we've read some rows from these tables, we store them for future access in another component of the Shared Pool called the Data Dictionary Cache. The data dictionary cache is where in-memory copies of the data dictionary tables, needed to carry out parsing operations, reside.

The job of the dictionary cache is therefore to stop us having to visit the system data file on disk every time we parse.

Since both the Library Cache and the Data Dictionary Cache are both sub-components of the Shared Pool, and since both are designed to minimise parsing activity, or at least make it an in-memory affair if it does have to happen, it is reasonable to conclude that the job of the Shared Pool overall is to make parsing SQL statements as cheap an affair as possible.

The Buffer Cache

The Buffer Cache is a memory area in which the data physically stored in the database is read and modified. When you want to see the employee records in the EMP table, they are first read from disk and loaded into the Buffer Cache. Why not simply fetch the rows off disk and straight back to you and your client PC? Because someone else might want to read the EMP records just after you. If we loaded the rows into Buffer Cache first time round, the second person won't have to read the rows from disk at all: they'll be able to access the copy of them direct from the Buffer Cache. Reading rows of data off disk involves performing a physical read. Reading rows out of the Buffer Cache copy of them involves performing a logical read. Logical reads don't require physical disk access and so are much quicker to perform than physical reads. The job of the Buffer Cache can be said to be, therefore, to minimise physical reads and to help maximise logical ones.

It's not just reading data, though. If you want to update a record, the modification is made to the version of the data stored in the Buffer Cache. Only later is this modified buffer (known as a dirty buffer) then saved back down to disk. When we write a modified buffer back down to disk, the contents of the in-memory buffer become the same as the on-disk data, and we therefore say that the buffer is now a clean buffer.

Oracle never reads individual rows, though. If you want to update Jim's phone number in the EMP table, we never just get Jim's record and let you at it. The minimum unit of reading and writing is an entire "collection" of rows -usually 2K, 4K, 8K or 16K in size. The idea is that if I am modifying Jim's phone number, you will probably want to update Bob's phone number similarly in just a moment or two... so my request to update Jim's record helps you out, because it causes the entire collection of rows around Jim's row to be loaded into memory at the same time. Your subsequent update to a different row hopefully therefore only involves logical I/O, not physical I/O -and you therefore benefit from my earlier work.

This 2K, 4K, 8K or 16K collection of data is known as the Oracle Block and is the minimum unit of I/O in an Oracle database. Request to see just one byte in a block, and the entire block is read into the Buffer Cache. The term "block" is actually used to describe the collection on disk; when a block from disk is read into memory, we call it a "buffer" instead. Essentially, though, a block and a buffer are the same thing, except that one physically exists and the other is only an in-memory thing. Some other database products (such as SQL Server) call the same thing a "page" of data. Blocks, buffers, pages... they're all really just different words for the same idea: a minimum 'chunk' of data read from disk into memory, and stored in memory for other users to find without having to re-visit the physical disk.

The Log Buffer

It is impossible to generalise, but if there is such a thing as a 'typical' Oracle database, you might measure the size of the Shared Pool and the Buffer Cache in the several hundred megabytes range, or bigger. In comparison, most Log Buffers anywhere in the World would not be much bigger than, tops, 10M or so.

The Log Buffer is thus a relatively tiny area of memory in which a record of the changes you make to data is stored. Every insert, update or delete that you perform will generate a record of what row was affected and how it was modified. That record is written, in the first instance, to the Log Buffer component of the SGA. From there it is periodically saved to special files on disk, thus making the record of what transactions have taken place on a database permanent.

The transaction record is generically known as redo, and we can therefore say that redo is first written (for reasons of speed) into the Log Buffer of the SGA for each and every transaction that users perform.

As the name implies, redo is generated in order to let us 'do again' those transactions which might otherwise have been lost because of hard disk failure or power supply failure or software error or user stuff-up. Redo, in short, is Oracle's recovery mechanism, and the Log Buffer is simply the first place redo is created.

The Large Pool

The Large Pool is an optional component of the SGA, though these days it is so useful to have that I doubt many production databases would not be using one: optional it may be in strict right, but I suspect most DBAs these days consider it functionally compulsory.

Whenever you parallelise an operation in Oracle, communication between the individual parallel slaves requires memory access. That memory is usually grabbed from the Shared Pool if nothing else is available: but if a Large Pool has been configured, it gets taken from that instead. You end up using parallel slaves whenever you do a parallel query, perform an RMAN backup or utilise multiple I/O slaves. Parallel query and multiple I/O slaves are slightly exotic (and cost money), but practically every Oracle database on the planet needs backing up with RMAN -so that means practically every Oracle database on the planet needs a Large Pool.

You can do all of these things without a Large Pool, but doing so means the Shared Pool is stressed. It's busy trying to minimise hard parses, and suddenly great gobbets of memory that it needs to do that job are pinched by a backup job or a parallel query! Not a good outcome, in short... and the existence of a Large Pool would mean that such a conflict never arises.

So let me make it simple for you: if you ever back up your database, you need a Large Pool. That should garner a near 100% response, I would have thought!!

The Java Pool
Since Oracle 8i, it has been possible to write procedures, functions and other code which gets stored inside the database in the Java programming language. It has since turned out that you would be certifiably insane to do so but nevertheless, the capability remains. (Java is better utilised "in the middle tier" -that is, in an application server, rather than in the backend database itself). If you run Java code in the database itself, it needs its own unique area of memory in which to execute -and that area of memory is called the Java Pool. It can be enormous (in the gigabyte range, for example) -but if you don't run Java in the database at all (all your code is, say, PL/SQL), then it can be zero in size.

The Streams Pool
New in 10g, a special area of memory within the SGA is now dedicated to allowing a database to replicate itself to a different location (think of a head office database which needs to be 'copied' up to a branch office out in the boondocks). If you don't have a need to have your database replicate itself, you don't need a Streams Pool. Before 9i, replication took whatever memory it needed from the Shared Pool -and hence the business of preventing hard parses was compromised. Oracle version 10g has recognised the significant memory requirements for effective replication and has thus created the Streams Pool as a dedicated resource which alleviates the burden on the Shared Pool.

Summary
When the Oracle program runs, it grabs a large chunk of memory in which it can do its work. That 'chunk', in its entirety, is known as the System Global Area, or SGA. The SGA is not a monolithic chunk of memory, however: it has internal structure, and is internally chopped up into discrete areas of memory which perform specialised tasks. These sub-areas are known as the Shared Pool (which caches execution plans and the data dictionary tables), the Buffer Cache (which caches ordinary data), the Log Buffer (which records changes made to ordinary data), the Large Pool (which is needed for efficient memory allocations to parallel slaves and backup processes), the Java Pool (providing memory to Java stored procedures) and the Streams Pool (unique to 10g and above, and providing memory to replication services).

No comments:

Post a Comment