Oracle Performance Management

Oracle Performance Management

Sometimes Your Eyes Get in The Way of Your Vision

leave a comment »

Odd little phrase but it is really true. Those of us who have been proficient in an area of expertise for a number of years should understand its meaning. Over time, we encounter numerous experiences with a product. Eventually, with enough of these experiences, we understand the product from the inside out all without the benefit of knowing the product code.

So, what does this all have to do with eyes and vision? Well, in technology it is tempting to routinely depend on product documentation and technical support which is most often immensely helpful. However, from time to time, the solution is not yet documented or contained within the traditional troves we are accustomed to turning to. What then? We are left to dig deep into our own experiences to search for solutions. We cannot see these solutions since they have not presented themselves until now. Therefore, we have to envision the solution based on our cumulative experiences both successful and unsuccessful. Solutions often ‘come to us’ in the form of ‘it feels right’ where we are unable to provide concrete and tangible rationale.

Providing a recent real-life example trusting my instincts was fairly easy. A client was experiencing a sporadic error while converting a standby (Data Guard) database from read-only to read-write. The error received was an internal ORA-00600 internal error indicating the SCN number was too small. A search of Google and MetaLink did not reveal much that I could use. An Oracle SR was created and assigned to the Oracle development team who was charged with diagnosing the issue. In the meantime, the issue increased frequency and databases it occurred in. At that point, I had no references I could rely on. I knew the process involved (custom shell script invoking SQL*Plus and RMAN) and how the conversion occurs from a scripting point of view. After a little while, I remembered from a long time ago that shutting down and starting a database in quick succession is not always a good thing. Oracle deems its processes as either critical or non-critical. A database/instance is considered up and running when its critical processes are up and running. However, certain non-critical Oracle processes are critical if the database requires them. The client’s database uses all kinds of options and features which in turn use Oracle processes. No doubt some of these processes (standby redo apply for example) become critical by the database even if it is not considered critical by Oracle. In order to address the issue by providing a work-around solution, I inserted ‘sleep’ commands after database starts and stops to ensure all processes were up and running before shutting down the database. That seemed to do the trick and will remain in place until Oracle is able to provide a more permanent solution. Oracle did not have a solution and Google did not have a solution so a solution was not visible. I was able to ‘envision’ a solution based on various experiences over the years.

Advertisements

Written by jpaugustin

July 25, 2015 at 11:07 PM

Posted in Uncategorized

Confessions of a Compulsive Tuning Disorder (CTD) Addict

leave a comment »

From time to time I read an article or a blog posting about “Compulsive Tuning Disorder (CTD)” usually in a mocking or other unflattering context. I give thanks daily for the sense of humor I enjoy even if no else understands me while in a state of delirious giggle and laughter.

Sure, occasionally I succumb to the temptation of tuning just one more query in the pursuit of happiness when it is difficult to justify. However, for the vast majority, I am able to justify the return on equity (ROI) for each and every tweak and turn of the knob. As for a new index to enhance query performance, the cost is fairly easy to justify. Each index requires approximately three (3) logical reads (LIO) for each DML that affects the new index. Therefore, in a database table where there are one (1) million DMLs, the approximate cost of each affected index is three (3) million LIO. By measuring the effectiveness of the new index on the query, we know that the new index is responsible for reducing its LIO by one (1) million. Therefore, its cost is more than its benefit and the new index may not be worth the investment. However, what if the new index made it possible for the nightly batch schedule to complete 10 minutes sooner and the nightly batch schedule is in danger of SLA exceedance which would result in financial penalties? Would that change things? I do believe that SLA (and other downstream) considerations must be included in performance tuning goals and objectives. In these situations, aggressive tuning is required even if the reasons are not immediately obvious.

For those less fortunate who subscribe to the belief “if it ain’t broke, don’t fix it”, we will continue to disagree. I have always been taught and believe that an issue should be addressed and resolved before it becomes a problem. Crisis mode is hardly an event I would characterize by level headed/common sense based thinking required to deliver sound and effective solutions. As DBAs, we have the unique ability and tools to see issues long before anyone else and certainly before crisis level approaches. It is far more effective and efficient to fix an issue before it becomes a problem. For each identified issue bubbling below the surface, we need to assign a value or priority that characterizes its impact, cost to remediate, and cost to do nothing.

Here is another real life example. Company xyz has a staff of 100 service representative who assist customers with issues. Each service representative manages 300 calls a day that each begin with retrieving the calling customer information. Let’s say that the customer information query takes three (3) seconds to complete. Not too bad and not an issue since no one is complaining, right? The three (3) seconds per query times 300 calls a day is 900 seconds (15 minutes) for one service representative. Still not too bad, right? Multiple the 15 minutes times 100 service representatives and you get 1500 minutes (25 hours). Now, that seems like a lot! What if the three (3) second query could be reduced to sub second (0.1 seconds) for example. Now, with the improved query, idle time per service representative is reduced to 30 seconds and less than an hour for all 100 service representatives.

Extend the above scenario to 100 queries and suddenly there is a huge impact to the entire system including each and every user (internal and external) requesting services from the database. Less LIO implies less physical reads (PIO) and less CPU time. Less PIO could possibly require a less robust IO storage subsystem. Less CPU time should provide a more consistent experience over time. Less CPU time could also require less CPUs and database software licenses which could translate into noticeable savings year after year.

So, life is not always what it appears to be and the same is true for tuning. We need to be able to accurately weigh the cost of remediation and the cost of doing nothing on a total system wide basis if we are to accurately determine the correct course of action. Even so, a person dedicated to performance and efficiency often finds it difficult to settle with less than optimal products when they can enhance the situation for little to no risk and cost.

Written by jpaugustin

November 4, 2013 at 3:22 PM

Posted in Uncategorized

Database Segment Statistics – METHOD_OPT

leave a comment »

Database segment statistics is information (metadata) about segments (indexes, tables, columns, etc.). Statistics are comprised of various counts (cardinality) related to the segments and columns such as table row count, index blocks per key, and the number of table row entries where its column value equals ‘XYZ’ (for example). Statistics are the primary source of information used by the optimizer to determine a SQL statement’s optimal execution plan.

As in everything Oracle, there are multiple ways and methods to accomplish anything and the same is true for segment statistics gathering. And, as always, it depends on what is trying to be accomplished and what are the constraints. Sometimes the varied methods are a result of newer methods replacing legacy methods and sometimes not. For segment statistics gathering, let’s assume that the method is Oracle supplied package DBMS_STATS. The variances in this method mostly reside within the ‘METHOD_OPT’ option.

An interesting and controversial LinkedIn entry ‘Two common mistakes using METHOD_OPT in DBMS_STATS’ was created recently in the LinkedIn forum ‘Oracle Database Performance Tuning’ by Carlos Sierra (SQL-T author). Carlos’ asserts that every column should be included in the histogram collection activity, not just the indexed columns. Those views are practically universal in the performance tuning community and are considered a best practice for database maintenance. And who could argue with that, if the optimizer primarily uses segment statistics to determine a SQL statement’s optimal execution plan, then more statistics must be better.

Below the surface lies the requirements and constraints which Oracle refers to in its time tested phrase ‘it depends’.

What are the database performance requirements? If the database has very stringent performance requirements, then the statistics gathering process must be comprehensive (histogram collection on all columns) to help ensure that the most optimal execution plan is generated each and every parse (consistent). Otherwise, it may be acceptable to collect histogram information on only the indexed columns.

What is the database tier designation? If the database is associated with a top tier application function, then the statistics gathering process must be comprehensive (histogram collection on all columns) to help ensure that the most optimal execution plan is generated each and every parse (consistent). Otherwise, it may be acceptable to collect histogram information on only the indexed columns.

What are the scheduling constraints? If the database is idle weekends or during the entire night, it may make sense to be comprehensive and collect histogram information on all columns. However, if statistics gathering negatively impacts normal database operations at all times, then it may make sense to collect histogram information on only the indexed columns to minimize the negative impact.

What are the CPU constraints? Similar to the scheduling constraints above except that this constraint is specific to CPU time. Not normally an issue these days with CPU performance as they are. Elevated CPU levels as the norm typically indicates a poorly performing system in need of attention.

What are the I/O constraints? Similar to the scheduling constraints above except that this constraint is specific to disk activity. It is fairly common for applications/databases to be active during the night particularly due to daily batch type processing. It seems, more often than not, that batch processing is where the majority of database I/O is consumed. Typically, large amounts of data are processed often involving full table scans and the like. Perhaps the weekends are available for high I/O type processing like statistics gathering. If I/O is always an issue, then it may make sense to collect histogram information on only the indexed columns.

What are the disk space constraints? With little exception this is a non issue. Disk drive costs are competitive enough such that their cost is minimal to the benefit that comprehensive statistics gathering delivers.

What is the database configuration parameter related to bind variable peeking? If the database configuration has ‘bind variable peeking’ disabled and the application uses bind variables and does not use literals, then the column histograms are not used by the optimizer in generating execution plans. In this scenario, why collect the information if it is not used? Perhaps to be inspected manually while diagnosing performance issues but is all the extra work worth it? How frequently the histogram information is used and the measured impact to the database performance during collection should help answer that question.

How much ‘people’ time is available for diagnosing performance issues? People time is the most expensive IT cost year after year but if there is spare time available, then a less than aggressive position may be taken by collecting histograms on only index columns. I would think that individuals are less than eager in announcing that they are under-utilized unless there is something else in their sights more appealing.

So, after all of the above (and much more), what is the answer to the simple question ‘How should METHOD_OPT be set’?
In short, if the database is able to absorb the additional overhead, collect histogram information on all columns. It is the best avenue for the optimizer to generate the most optimal execution plan and it is a good standard for engineered systems. Exceptions to the engineered standard can be managed one database at a time. However, as in all things Oracle, it depends on the database’s unique requirements and constraints.

Written by jpaugustin

August 14, 2013 at 12:04 PM

Posted in Uncategorized

What is performance?

with one comment

For a couple of months I was wondering if I should dare attempt to illustrate performance goals in real life terms. I was fortunate enough o be in attendance of the Enkitec E4 conference in Dallas this past week where Tom Kyte (ask Tom) gave the first day keynote address. He mentioned that he drove a Toyota Prius and referenced it in the keynote address. So, if it okay for Tom Kyte to mention his Prius maybe it is okay for me to mention my Prius.

You see, I drive a Toyota Prius and believe that it offers the best efficiency and value over the life of the car. Alternatively, a Porsche’s performance is undeniable in getting from point A to point B. A Prius costs $28K, goes from zero to 60 MPH in 12 seconds, and gets 50 MPG. A Porsche, on the other hand, costs $100K, goes from zero to 60 MPH in less than four (4) seconds, and gets 19 MPG.

The question that Tom Kyte asked and that I have been pondering specifically for a while now is which one is the better performer. I have come to believe, as in most all things, is that it depends on what is desired and what are the requirements. Is the need to get from point A to point B as fast as possible or as efficient as possible… As in everything Oracle, the answer is ‘it depends’. In our project management studies we learn that project approach is determined by the constraints – the resources available (time and cost).

Products (software, architecture, etc.) such as online stores that require minimal response times and high availability, probably require a Porsche to run their transactions. Well known industry metrics exist to easily justify the cost of high end architecture to ensure a robust system which makes it easy to attract and retain customers over time. The metrics commonly spell out and quantify how much each minute of downtime costs to the business.

On the other hand, most applications are not designated as ‘top tier’ and are therefore able to experience less than stellar performance. These may fall into the category of a Prius – elegant and efficient yet not a stellar performer.

In essence, these are the same type of decisions businesses make every day. A value based approach in which each investment is evaluated for its ROI (Return on Investment) and ability to generate revenue. IT organizations need to embrace and support this approach if it is ever to be seen as a welcomed and integral part of the business. IT must work with the business owners and stakeholders in order to achieve optimal value from the IT investment. In-source or out-source, infrastructure architecture, and staffing levels are all part of the formula to achieve a highly efficient and highly performing organization.

As it relates to performance management, invest mostly in those applications (and associated infrastructure) that are most critical to the business. Top tier applications require redundancy and high availability measures (Exadata, RAC, Data Guard, Golden Gate, etc.) to help ensure business continuity even in the event of a catastrophic failure. Architects and performance specialists are needed to ensure superior performance levels. Applications that are not designated as ‘top tier’ are able to function satisfactorily at excellent value on a less redundant architecture with less than stellar performance.

Written by jpaugustin

August 9, 2013 at 4:59 PM

Posted in Uncategorized

DBAs, Know Your Application

with one comment

Due to a recent experience, I feel compelled to dive a little deeper to explore a specific topic. Some years ago during a job interview I was asked if I thought it was necessary to know the application that I support as a DBA. Not being totally naive, I understand well that there are at least two positions: 1) those that believe a database is a database and a DBA can support any database, and 2) those that believe it is essential to understand the supported application.

I guess both positions are technically correct but who of value does not desire to support their database (and thus application) to the best of their ability. Sure, application knowledge is not required to perform a backup, monitor the database using standard metrics, or add disk space as needed. However, application knowledge is required to support and enhance the application beyond rudimentary service levels.

And this is where we get to my recent experience. Approximately 5 years ago, I developed a partitioning and ILM strategy at a fairly large client. I compiled a list of the largest tables and forecasted their near and long term growth and settled on approximately 25 tables. That was the easy part but now the actual partitioning strategy had to be determined. I located a ‘date time’ column which was used in many SQL statements as filtering criteria whose value did not change. Also, the client wanted to simplify their archival selection criteria and therefore based it solely on the ‘date time’ column.

Next was the decision on the number of partitions or the interval of time between the partitions. Only after researching the data access patterns for a period of time did I learn that a vast majority of the queries select data within the past week. This and many other observations were confirmed by the client in joint design sessions. Knowing that tables normally consume approximately 90% of the total physical reads in a database, and this database was no different, it seemed important to get more table data in the buffer cache to reduce the amount of physical reads. With this information, I experimented with the prospect of utilizing weekly partitions. The current and prior week partitions were pinned to the KEEP cache whereas the other partitions were left to the default pool.

The above strategy worked very well. The client was looking for a way to rid its OLTP database from the older legacy data and thereby reduce its overall disk space costs and perhaps gain a little performance in the process. The designed and developed ILM strategy (in tandem with the partitioning exercise) considerably reduced disk space costs by migrating to a more cost effective disk solution and by using Oracle basic compression to reduce the disk space footprint. A noticeable performance benefit was realized due to the improved clustering of table data and reduced index size. Everyone was happy as this was a successful exercise.

Now, move ahead 5 years. A new consultant to the project, unaware that I was the partition designer, went on and on about how dreadful the design was. Weekly partitions were simply too much and he could not believe someone would do such a thing. I patiently listened hoping to learn something new from a fresh face.

Without understanding the application and premise for the design, how can anyone understand why things were done the way they were. In this situation, the ‘bull in a china shop’ approach would have been a mistake. I still believe that the original design was optimal to meet and surpass client expectations although I am open to differing opinions. It is essential that to provide top tier results (partitioning and ILM) requires an in-depth understanding of the subject matter (application and data access pattern). This is but one of many instances of where application knowledge and insight is absolutely essential to high quality long term design. I feel very strongly about this and foresee myself those feelings more in subsequent blogs.

Written by jpaugustin

April 20, 2013 at 9:56 PM

Posted in Uncategorized

Enkitec Extreme Exadata Expo – E4 – 2013

leave a comment »

I am very happy to help announce the second annual Enkitec Extreme Exadata Expo to be held in Irving Texas on August 5th and 6th of 2013. The venue will be located at the Four Seasons Hotel and Resort. Last year was the inaugural year hosting the event and yet I found it so professionally staged and managed that one would be convinced that it was a mature production with several seasons behind it. I simply cannot imagine how last year could be topped.

Image if you will, all of the greatest Oracle minds in one location openly discussing current and future technology trends. I was the proverbial ‘fly on the wall’ but was awe-struck none-the-less.

Consider the expo attendance mandatory for any Oracle professional deeply serious and committed to the craft and their career.

More information, including registration can be found at URL http://www.enkitec.com/e4

 

E4 2013 Button

Written by jpaugustin

February 17, 2013 at 3:29 PM

Posted in Uncategorized

Performance Management Demystified

leave a comment »

Performance Management is a very intriguing subject and carries with it various specific definitions depending on the object that is analyzed. In any event, the common thread revolves around and is fundamental to the very simple question ‘How can I increase productivity using relatively less resources’.

This one simple question is the primary driver and motivator for all processes abstract and concrete. Think about it, corporations are in business to make money. In order to maximize profits, businesses need to produce more at a lower cost than the competition. Sales teams need to sell more products (or quantity products) using fewer people. More freight needs to be shipped using less resources and/or time. Power plants need to produce more power using fewer resources. Teams need to be assembled and managed according to their production requirements. People need to be able to produce more for less cost. Networks, computers, application design, software code, etc. all follow this same thought process for getting the most productivity for the least cost.

With the above view in mind, it is now impossible to consider Performance Management (tuning) as an isolated activity without consideration of the object’s various interface points. The ‘holistic’ view must be considered and factored in whenever contemplating a modification to any part of the whole. For example, increasing the degree of parallelism may benefit one query but is a liability when it is done at the cost of every other query. Perhaps there are insufficient resources (CPU and/or network bandwidth and/or file I/O throughput) available to support the improved query and the rest of the system at the time the query runs.

The never-ending Performance Management goal is always elusive in that when a bottle neck is removed or minimized, a new bottle neck appears. And then, when is the system at its peak? The holistic (or supply and demand) approach answers that question as well … the Performance Management initiative for the focused subject ends when the requirements have been met.  And with that, the requirements are telling me that this blog entry has fullfilled its requirements and is therefore terminating.

Written by jpaugustin

December 9, 2012 at 10:41 PM

Posted in Uncategorized