DBA and Developer: Role Definitions - Designing queries used to access the database
A database is nothing more than an over hyped file system without the structured query language that allows access to that data in a programmable method. Efficient and concise query design can improve server performance over a dataset, such that, DBAs and developers should make it one of their top priorities. Developers should work through their SQL code and certify that there are no inefficiencies which could slow database performance. On the other hand, DBAs should review any code that is submitted for running on a live environment, for code inefficiencies, and send them back for review/alteration.
Usually, systems are built in a production and development split. That is, there are production (live) databases and development (dev) databases. Developers write and test their code on the development environment, ensuring that it runs with the least possible IO (input/output). This code is then submitted (generally via change management) for production implementation, to the DBA. In this instance, the DBA’s role is to review and confirm that the code is suitable to run on production without bogging down essential system operations.
Code can be optimized in many different fashions. This can be through the removal of cursors and inline queries, to the addition of indexes over complex datasets. Code optimization should be in the court of the developer, as they are closest to understanding what the data is and what is trying to be accomplished. The DBA should review this code, but is also the deciding factor on whether indexes are truly required in a live environment. Reason being, the DBA understands the database architecture and how the differing systems integrate with one another.
Were developers to be given free reign over live data sets, it is likely that simple queries could bring a server to a stand-still. For instance, a group of five developers each creates a query referencing a single table. That table contains one hundred million rows of data across thirty columns, for a total of 3,000,000,000 fields of data. If each of the five queries references 1/3 of those columns, and each were to run at the same time, the IO would be to such an extent that any other transactions on the database would likely halt until query completion. Having a DBA act as oversight for query implementation, ensures that queries can be scheduled to run at optimal timing so that overall server performance is not affected drastically.
Overall, clear management of a database should come from the DBA. This requires that developers write their code efficiently and concisely, and that DBAs only approve code that meets said standards. Additionally, code should only be implemented in such a way that system performance is not hindered. The DBA’s viewpoint is essential in managing complex interconnecting systems which could likely interfere with each other given the opportunity.
The role of a DBA is both diverse and very specific. It is diverse in that a DBA can be in charge of many different databases or a very select few. Additionally, the role of the DBA from organization to organization can vary greatly in terms of responsibilities. In some organizations a DBA is placed at a senior management level due to the sheer amount of responsibility and power they have over the organization's data. In others, a DBA is a mid-level technician as their role is to support, and maintain, a smaller set of data which may or may not be core to the organization's operations.
What I am getting at here is, requirements to be a DBA can be diverse. However, for all DBAs there is a specific set of requirements and knowledge they must possess. A DBA is usually certified in the database architecture that they are hired to administer. Additionally, they have a tertiary level of education, usually at the bachelor's level or higher. Finally, most organizations want their DBAs to have some business knowledge and background, this usually sees higher paid DBAs getting their MBAs or appropriate business management certificates.
As for table structure and entity relationships, these are generally managed through normalization and de-normalization planning and implementation. That is, depending on the database use, splitting up large datasets so that transactions require the fewest IO operations per table per user can tremendously increase database performance. This is usually done in a transaction database, where a large user base is likely to simultaneously access data on an ad-hoc basis. An example of this is the backend database of a point of sale system.
Denormalization usually takes place on database warehouses. Small diversified transactional tables are brought together into large reporting tables for ease of business intelligence gathering. These datasets are kept in a denormalized state due to increased data knowledge through logical table structures. In other words, denormalized databases are much more human friendly over transactional databases.
Entity relationships in both of these tables take the form of primary and secondary keys. The primary keys link tables via distinct attributes which exist over multiple table structures. Secondary keys can take the form of distinct attributes and unique concatenated keys.
In a transactional database, tables are split out into their most efficient normalized state. That is not to say they are completely normalized, but that they are normalized to the extent that any further normalization would decrease transaction performance, rather than see a performance increase. This is done through the use of primary keys, link tables, and ensuring many-to-many relations are removed.
It is the responsibility of the DBA to ensure table structures are maintained such that transactional systems operate in an efficient and timely manner. Developers need not know the full reasons of table structure layout (although this surely helps in the long run), their concern is with query efficacy and efficiency. In essence, the developer looks to get the job done, and the DBA looks to get the job done right.
Burleson Consulting. (2012). How to become an Oracle DBA. Retrieved from Burleson Consulting: http://www.dba-oracle.com/t_how_to_become_an_oracle_dba.htm
This lab displays the process of installing Oracle 11g on Windows Server 2012
Database Tablespace: Static Schema versus Dynamic Schema
The primary difference between a schema whose structure changes very little over time, that is grows at a slow or negligible rate, and that of a schema which may grow rapidly or even exponentially is in the initial TABLESPACE parameters for the schema itself. Initially defined, the slow growing EMPLOYEE schema, on database ORCL, is given the following parameters:
This TABLESPACE is not allowed to grow and is limited in size to five megabytes. In essence, it is not foreseen that the EMPLOYEE schema will ever grow beyond this size, that is, data entered into the EMPLOYEE tables will never exceed the five megabyte limit.
A dynamic schema whose storage requirements will grow over time, and whose limits are unknown will need an entirely different parameter base. For instance, a transactional schema named SALES, on database ORCL, will require the following parameters:
SALES will now have the ability to store data until such time as there is no more physical space to store that data. A BIGFILE TABLESPACE essentially tells Oracle that the schema in question can store data indefinitely, so long as AUTOEXTEND is on. There is an upper limit in the exabytes of data, but this should not be an issue for a very long while. A further advantage to this is that all of the SALES data can be contained in a single TABLESPACE thereby reducing the amount of management required for backup and restoring the SALES_TS TABLESPACE.
Current and future Oracle systems no longer require a STORAGE parameter to be made during a CREATE TABLE syntax in any locally stored TABLESPACE. In effect, using STORAGE in a CREATE TABLE statement is redundant and will not actually do anything. It is still supported by Oracle syntactically due to some older code still requiring it, but it will not do anything on current TABLESPACE structures. Moreover, assigning PCTUSED and PCTFREE is discouraged as the Oracle system should be managing the STORAGE automatically (Poweel & McCullough-Dieter, 2010, p. 307).
Row chaining and row migration are to be avoided as much as possible. Row chaining is when the physical data of a given TABLESPACE is carried across multiple Oracle blocks. That is, when data inserted or updated on a given row does not fit the data block it is carried over into an adjacent block. Migration, on the other hand, is when data would fill up a whole block based on the given PCTFREE. That is, unlike chaining where data simply does not fit in a block, this data does fit but would fill up the block past the PCTFREE mark. Therefore Oracle moves the data to another block and leaves behind a navigation pointer to the new location.
In modern Oracle (e.g. 10g +), “automated segment space management is encouraged, and thus free space is managed by Oracle” (Poweel & McCullough-Dieter, 2010, p. 307). In other words, row chaining and row migration should not be an issue in modern locally stored TABLESPACEs. This is only a concern with older databases where TABLESPACE is stored in a data dictionary.
Storage space for individual tables and table creation is managed through the TABLESPACE structure of a given user schema. That is, trying to manually manage the storage of each individual table in a schema is discouraged in Oracle 10g +. However, defining the starting storage available to a given object’s table space, and how much that storage is allowed to grow, is supported. Mind you, this is based on current and future trends of local TABLESPACE storage.
In terms of determining how much space to provide for object expansion (that is, schema allocated TABLESPACE storage), one must determine the schema’s usage currently and into the future. In this regard, the best question to ask a developer would be, “What is your perceived storage space requirements for the entire schema named “x” and how much do you think that will grow over time?” Based on this given information, the initial TABLESPACE size can be defined and an AUTOEXTENT MAXSIZE can be defined.
For instance, given that a schema named CARDS_MGMT on database CORE will be tracking EFT cards and transactions indefinitely, a TABLESPACE can be created with the following:
CREATE TABLESPACE CARDS DATAFILE ‘ORACLE_BASE\CORE\cards01.dbf’ SIZE 100M AUTOEXTEND ON MAXSIZE UNLIMITED;
From here the schema can be assigned this TABLESPACE for default usage:
ALTER USER CARDS_MGMT DEFAULT TABLESPACE CARDS;
Since the user schema CARDS_MGMT now has a TABLESPACE specifically assigned to it, the CREATE TABLE syntax will no longer require a STORAGE component. Moreover, since the TABLESPACE is a local entity, the STORAGE syntax within the CREATE TABLE query will not work as it is only for a DICTIONARY based TABLESPACE which is no longer supported by Oracle.
What the TABLESPACE created does is, allocated 100 Megabytes of storage for the initial cards01.dbf file. This file can then auto grow, or AUTOEXTEND to an unlimited space based on future CARDS_MGMT transactional updates and inserts. What this means from a DBA perspective is, all storage is maintained in a single manageable location for the CARDS_MGMT schema. Therefore if the file needs to be moved, changed, or restored, the DBA knows where it is and how big it will likely be.
Ostensibly, TABLESPACE specifications would allow for database storage to be allocated based on transactional requirements, e.g. high frequency transactional data could be placed on fast storage while low frequency non-transactional data could be placed on slow storage. However, in current network architectures with SAN storage and virtual environments, this can be done on an ad-hoc basis invisible to the database structure itself. This means that storage can be micromanaged based on load balance requirements.
Extract: The primary concern with having too many indexes is in storage requirements. Each index essentially doubles the required storage space for each indexed column. This in turn can cause lookups across multiple indexes to slow down as Oracle blocks become full index leaves (BTree) become wide spread. However, in the instance that storage is not an issue (that is, space can be allocated dynamically and expanded infinitely); too many indexes can still cause issues.
Extract: A BTree reverse key index, as recommended by Oracle, should be used specifically for Oracle Real Application clusters. That is, on databases that require high levels of simultaneous transactions. Unlike a normal BTree index, every entry in the index, except the ROWID, is reversed. These types of indexes are only used by the Oracle optimizer when a single record search or a full index scan is required (Poweel & McCullough-Dieter, 2010, p. 435). That is, when multiple I/O connections are trying to read and/or write to the same Oracle block of data.
This lab displays the loading of data into tables from a flat file, examining PL/SQL procedures and triggers, as well as monitoring the performance of the database. Included in the zip file is the lab assignment and all SQL source code used in the lab.
This lab displays the creation and administration of roles in Oracle 11g, the dangers of the CASCADE command, and how to diagnose ORA errors.
Database Auditing: Ensuring Updates are Authorized
As the specific table in question is the EMPLOYEES table, I would recommend placing all audit functions around this object. Additionally, since we are specifically worried about unauthorized changes to SALARY information, auditing should be limited to UPDATE statements which succeed. This will ensure the least amount of audit information is stored with the greatest effect.
The commands to do this are as follows:
CONNECT SYS / AS SYSDBA
ALTER SYSTEM SET AUDIT_TRAIL = 'db' SCOPE=SPFILE;
CONNECT SYSTEM / <<password>>
AUDIT UPDATE ON EMPLOYEE.EMPLOYEES BY SESSION WHENEVER SUCCESSFUL;
To view the audit afterwards:
WHERE OBJ_NAME = 'EMPLOYEES';
Once the culprit, or the lack of a culprit, is found, the audit can be shutoff with:
NOAUDIT UPDATE ON EMPLOYEE.EMPLOYEES BY SESSION WHENEVER SUCCESSFUL;
How this will work is that any updates on the EMPLOYEES table will cause an audit log to be written into the DBA_AUDIT_OBJECT table. These log additions will only occur per session and only when the change is successful. It is likely that the majority of the changes that occur will be authorized, or be done without committal. Since the log is only on one table with one type of DML command being monitored, the amount of data should be kept to a bare minimum.
Extract: Ensuring database uptime is the primary concern for a DBA, and as such, disaster recovery (DR) planning should be taken quite seriously when it comes to protecting systems data integrity. Therefore, appropriate planning and implementation of a full scale DR solution should be readily available and up-to-date at all times. This solution should outline every aspect of restoring the database and data to full operational state, from scratch. Additionally, it should be written in such a way that any staff member should be able to implement it in the event of no IT personnel being available.