Oracle Database 23ai: Backup and Recovery - Part 2

Oracle University Podcast

12-11-2024 • 15 mins

Lois Houston and Nikita Abraham continue their deep dive into Oracle Database 23ai backup and recovery strategies with Senior Principal Database & MySQL Instructor Bill Millar.
Picking up from Part 1, they explore critical concepts such as instance recovery, checkpoint processes, and the role of redo log files. Bill shares insights into complete and incomplete recovery, flashback technologies, and lots more.
Oracle University Learning Community: https://education.oracle.com/ou-community
Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode.
--------------------------------------------------------
Episode Transcript:

00:00

Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started!
00:26

Nikita: Welcome back to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs.

Lois: Hi everyone! Last week, we had a fantastic chat with Bill Millar, our Senior Principal Database & MySQL Instructor. We dug into the basics of backup and recovery. We touched on everything from a DBA’s role in preventing data loss to handling different types of failures, and even some common mistakes that tend to pop up when managing a database.

Nikita: Yeah, if you missed that episode, definitely go back and check it out. It’s packed with useful info, especially if you’re in charge of keeping databases safe.

01:10

Lois: Today, we’re picking up where we left off. We’re going to ask Bill about instance recovery and recovery strategies. Bill, can you kick things off by explaining what instance recovery is?

Bill: You can understand instant recovery by becoming familiar with the checkpoint process, the redo log files, and the role of the log writer with the redo log files.

Automatically instance or crash recovery. What is it doing? What are the phases of instance recovery? How we possibly can tune that instance recovery. We can use the mean time to recovery advisor that can help us determine how we might tune the instance recovery.

01:51

Nikita: OK, so let’s go through some of these concepts and procedures you mentioned. What is the checkpoint process responsible for exactly?

Bill: The checkpoint process itself, it's responsible for updating the data file headers with checkpoint information. When a checkpoint is taken, it is going to write into the controlfiles. It tells the DB writer to write. DB writer writes to the data files, and the checkpoint is also annotated in the data files.

So updating controlfiles with that checkpoint information also, controlfiles and database files. It signals that DB writer at full check points again, hey, it's time to write. So that way, it has the latest data written to the data files. The controlfile and datafiles, those are in sync with that.

02:40

Lois: Bill, what about the log writer process and the redo log files?

Bill: With the log writer process and the redo log files, the redo log files record the changes to the database himself. It should be multiplexed.

02:53

Nikita: What do you mean by that?

Bill: More than one redo log group. Now, the redo log groups, it is recommended that they should be multiplexed. Each group member should be on a different disk or in a different disk group if you're using ASM.

03:10

Nikita: And why is that, Bill?

Bill: Because if I lose one, if I lose one redo log group, one member, I can continue to operate with just the one. If I only have one redo log group member and the system comes around and tries to write to it, then my system is going to come to a halt.

So the log writer is going to write to those redo logs whenever somebody does a commit. When that redo log buffer is 1/3 full or every three seconds and before DB writer writes. So those are the four mechanism that tells log writer to write from that log buffer to the redo log files. And it'll also write, when we do a shut down, all the buffers will be flushed. And so that way, everything will be in sync when the system is shut down.

04:01

Lois: What are the different modes of operation for a database, Bill? And how do these modes impact the recovery capabilities of the database?

Bill: So we have two different modes we can operate in. One is called NOARCHIVELOG mode. It is the default. ARCHIVELOG mode, highly encouraged. But not every environment has to be in ARCHIVELOG mode.

04:21

Nikita: So with ARCHIVELOG mode…

Bill: Closed database. You have to close it, recover to the last backup. That's as far as I can go. Actually, I could, depending on what happens, I might be able to apply some redo.

Suitable for training and test environments or for data warehouses, we don't have a lot of frequent changes. It's mainly bulk loading data at night and querying during the day. So it might be appropriate for that.

Because ARCHIVELOG mode, it is a little overhead. Yes. So with that database, it goes down while it's open. The system, when it comes up, it can recover to the last committed transaction. And this is usually the mode we want to operate in for production environments.

So we have that data in the buffer cache. We have that redo being buffered. We have the undo tablespace, keeping track of what the data was before a change. The redo keeps track of what was the change.

And if we're in ARCHIVELOG mode, as we switch from one redo log to another, we will generate what's referred to as archived log files, and that's what allows us to do a complete recovery.

05:33

Lois: What happens in the case of automatic instance recovery?

Bill: For an automatic instance recovery or crash recovery, our system went down unexpectedly. Because it did not do a clean shutdown, the buffers were not flushed. Everything was not synchronized. So the datafile, controlfile, everything is out of sync.

05:53

Nikita: So, how do the files get synchronized then?

Bill: It uses the redo log groups to synchronize the files. It's going to roll forward. It rolls forward the changes that were made. So due to different distinct operations. Roll forward applies committed and uncommitted data. And the redo does not keep track of what was committed and uncommitted.

It'll keep track of, hey, I had this transaction, hey, here's a commit for that transaction. But hey, I have a transaction. That was never uncommitted. That's the job of the undo. But rolls forward all those changes. And then anything that did not actually receive a commit, it will roll back the uncommitted data, return to the original state. And that is the job of the undo tablespace.

06:37

Lois: Bill, is it possible to tune instance recovery for better performance?

Bill: You can try to tune this instance recovery. Tuning it is touchy. Be careful because you can cause more harm than what you think you might be doing good. The instance recovery, what we're doing, we're trying to-- the transactions between checkpoints. When was the last checkpoint?

Because the items between the checkpoints, that's what has to be reapplied. So the last checkpoint to the last redo log, what is that time frame there between those? Well, what we're going to do, we're going to try to control that. We're going to try to control the difference between the checkpoint and the end of the redo log.

There is a mean-time recovery advisor. You specify the desired times in seconds or minutes that how often you want that checkpoint to occur.

There is a parameter, FAST_START_MTTR parameter that you can set. The default value is zero saying, hey, I'm going to let the system take care of it. And the maximum you can set it is to one hour.

07:46

Nikita: And why 1 hour?

Bill: The reason being, if I set that to one hour and I have a lot of activity, how long is it going to take? How many transactions can happen within that hour? Yeah, I'm not doing a checkpoint as often, so I'm eliminating that workload. But if it has to recover, how long is it going to take?

If I set it too small, the system says, hey, right now, it's going to take me 19 seconds based off statistics. If I said, OK, I want it in five seconds. So what does that mean? Every five seconds, I'm saying do a checkpoint. So what is it doing? OK, time to do a checkpoint.

OK, time to go ahead and OK, DB writer write. OK, log writer write. OK, let me update the datafiles and the controlfiles. So you're just thrashing your system. So be careful if you decide to try to manually tune it.

And when you go out and look at this mean time to recover, and even if you do it through the command line, you'll see that, that value is most likely going to change throughout the day, depending on the workload that you have.

08:46

Lois: How does the process of restoring and recovering data typically work?

Bill: So when we restore, we're restoring our datafiles. All the datafiles, tablespace, controlfiles, archived redo log, server parameter file. Then when we recover, it involves depending on the backup that we use and other factors in there, it is going to apply the redo.

So automatically done by RMAN. So I tell it, this is what I want to do. Hey, I want to restore a database. OK, RMAN says, all right, what backup are you going to use? What is it I need to restore? And then we tell it to recover. OK, I know what I need to use to recover.

So RMAN can do the work for you. So when we restore and recover due to a manual process and there's different methods that we can use, and depending on the failure, we'll drive what type of restore and recovery we might perform.

09:40

Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges? With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available in General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come! Visit mylearn.oracle.com to get started.


10:22

Nikita: Welcome back! Can you talk about the different types of recovery scopes, Bill? How do they compare?

Bill: Recovery can have two kinds of scope. All right. One is the complete recovery. We are getting the database back to the current time of the crash with no loss of data. We're going to again bring everything back to the present.

Incomplete or point-in-time recovery. We're going to take a database or maybe a tablespace or even a table back to a point-in-time in the past. So from the time that we select to take it to recover, everything that was done after that is null and void, is gone missing. That's why it's called incomplete recovery, because it's not complete.

11:09

Lois: What are the steps that take place during complete recovery?

Bill: We restore the datafiles. Changes are applied. We're applying the redo. The datafiles contained committed and uncommitted transactions. The undo is applied. Anything that did not receive an actual commit will take back to the original value. And we have our datafiles recovered.

11:33

Nikita: And what about point-in-time recovery?

Bill: Point-in-time recovery, very similar. We're going to restore the datafiles from as far back as necessary. Changes are applied. So the data files are going to contain the committed and uncommitted up to that point-in-time. Database is open, that redo, that undo, anything that did not actually receive a commit. The undo is applied. The point-in-time recovered is complete. We're not applying all the redo, all the changes, only up to the time that we specify.

12:08

Lois: Are there any features that can make point-in-time recovery quicker?

Bill: We also have the ability to use flashback database. It is an optional feature. And it can be a quick way to do that point-in-time recovery. It is an alternative to that database point-in-time recovery we just looked at. Faster. No restore is required. It's going to rewind the database.

It does require some configuration in the environment. We do have to set up in order to use flashback database.

12:41

Nikita: I want to talk about Oracle’s data protection solutions, particularly when it comes to backup and recovery or disaster recovery.

Bill: So for physical data protection-- backup and recovery objective. Yep, that works for both physical and logical.

My recovery time, hours to days. Possibly minutes to hours for the logical. And Oracle solution, we have the Recovery Manager that's out of the box, RMAN. Oracle Secure Backup, that is Oracle's media management library system backing up to tape. The logical protection, yes, flashback technologies can help me take care of that very easily.

For disaster recovery, physical data protection, recovery time objective, seconds to minutes. We're not going to accomplish that with RMAN. You're going to want to use our Data Guard with the Active Data Guard feature to be able to switch over to a standby database within seconds of a failure.

13:41

Lois: Why would someone choose to use flashback technologies for recovery, Bill?

Bill: With the flashback technologies, we can use it for viewing data as past dates. What did it look like? We can wind the database back and forth in time. Assist users in an error analysis and recovery, because we have different technologies.

This flashback query, version query, transaction query, those allow me to view what was the value of a row at a time. I can even see what were the changes to a row over a period of time? I can also view the query that caused that change.

For error recovery, I can back out a transaction. I can take a table back to a non-current time. I can also flashback a table that was dropped. And I can also take an entire database by using flashback. So the different recovery options I might have with the flashback technology.

14:44

Lois: Thank you so much, Bill. These last two episodes have been so insightful, right Niki?

Nikita: I couldn’t agree more, Lois! If you want to know more about backup and recovery configuration and other concepts, visit mylearn.oracle.com and search for the Oracle Database 23ai: Backup and Recovery course. Our upcoming episode is a very special one, where we’ll be discussing Oracle AI in Fusion Cloud Human Capital Management. So, watch out for that! Until next week, this is Nikita Abraham…

Lois: And Lois Houston, signing off!

15:16

That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.