BEST WEBSITE FOR SQL SERVER DBA
- log backup is also based on full backup and a full backup contains all data of database, but not including all logs. Full backup only contains all data pages and logs in the tail of current database.
- Activity log means log records of current running affairs
In a previous post (see here) I debunked a myth about how much transaction log a full backup would include. I had a question in the blog post comments that asked (paraphrasing):
The full backup has to include all the transaction log from the begin LSN of the oldest active transaction at the time the data read portion of the backup ends, until the LSN at which the data read portion ends. If that begin LSN is later in time than the LSN of the checkpoint that backup does initially, why does the full backup need to include all thr transaction log between the checkpoint and the begin LSN? What is it used for?
I replied in the comments with a quip that it would be easier to reply with a whiteboard and a timeline – so I got all enthusiastic and created a picture in Powerpoint to help explain better.
Consider the timeline in the picture above for a full backup (the red numbers match the list below):
The backup operation take a checkpoint to force all dirty pages in the buffer pool to disk – both those containing changes from transactions that have committed and those containing changes from transactions that are still in-flight. The backup operation then starts reading the allocated pages in the database.
The read operation reads page X
Transaction A starts
Transaction A makes a change to page X. The copy in the backup is now out-of-date. Note that the backup will not read page X again – it’s already passed that point in the database.
Transaction B starts. It won’t complete before the data read operation completes so it’s begin LSN is the oldest active transaction begin LSN.
Transaction A commits. This commits the changes to page X.
The backup data read operation completes and transaction log reading starts.
Now, the reason that the transaction log is read is so that the restore operation can recover the database so it is transactionally consistent as of the point in time when the read data operation completed.
If the transaction log was only included from the oldest active transaction begin LSN (point 5), then the copy of page X that was restored from the backup (read at point 2) would not be updated with the changes from transaction A (that happened at point 4). This means that it would not be transactionally consistent with the rest of the database as of the time the read data operation completed (point 7).
So, (ignoring replication) the minimum LSN of the transaction log that’s included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction). This ensures that recovery can REDO log records to bring pages up-to-date and UNDO log records for transactions that had not committed.
Much easier to explain with aid of a picture than without! 🙂
Why does a data backup include transaction log?
Question: I’m what you call an ‘involuntary’ DBA and I’ve noticed something strange. Whenever I do a database backup, SQL Server says that it backed up the data files and the transaction log. Why does it need to back up the transaction log too?
Answer: The simple answer is that a data backup must include some transaction log otherwise the backup isn’t valid. When a database backup is restored, the result must be a transactionally consistent database (i.e. with no uncommitted transactions or structural inconsistencies).
SQL Server does not pause insert/update/delete activity in the database while it is being backed up, which means that the database can be changing while it is being backed up. Imagine the case of a database that includes a simple table with a single non-clustered index, with the nonclustered index page near the start of the data file and the table data page near the end of the file.
The backup starts reading from the data file. It reads the nonclustered index page. Then an insert occurs that inserts a data record in the table data page and an index record in the nonclustered index page. Another checkpoint occurs which flushes these two pages to disk. All the while the backup is reading from the data file. It then reads the table data page.
You can see how it is perfectly possible for the pre-insert version of the nonclustered index page to be in the backup, and the post-insert version of the table data page to be in the backup. If the backup does not include any transaction log, the restored database from our backup will have an inconsistency; a table data record that does not have a matching nonclustered index record.
However, a backup DOES include some transaction log. So in my example, the log records for the inserts into the two pages would be included in the backup and then analyzed during the restore. The log record that does the insert into the table data page will be ignored because the backup already contains the post-insert page image. The log record that does the insert into the nonclustered index page will be replayed – performing the insert on the page – as SQL Server can tell that the index page image is old and needs to be brought up-to-date.
The amount of transaction log that a data backup includes is from the end of the data-reading portion of the backup all the way back to the oldest point determined by:
- The checkpoint performed when the backup started
- The start of the oldest uncommitted transaction at the time the checkpoint occurred
- The start of the oldest unreplicated transaction at the time the checkpoint occurred
There’s a lot more going on under the covers but this should give you a good understanding of why a data backup needs to include transaction log: to allow the database to restore to a transactionally consistent point in time (as of the end of the data reading portion of the backup operation).