Safe Process To Reboot Storage Server on Exadata Hardware For Maintenance

If you work with Exadata, ZDL or any other Engineered System that you use Exadata hardware and need a hardware part replacement on storage server, you can follow the steps below to reboot the hardware safely, without impact the ASM structure.

I my example, I am going to replace a controller disk battery on a ZDL storage server, you can replace any part in failure.

Depending on your Exadata software version, in case of failure, Exadata software already set disks in WriteThrough, not needing to set this manually, but if not your case, lets see this before start the reboot.

Execute with root user on linux prompt, the command below to check if your disks are in WriteThrough or WriteBack:

[root@ra02celadm01 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show
CLI Version = 007.0530.0000.0000 Sep 21, 2018
Operating system = Linux 5.4.17-2136.330.7.5.el8uek.x86_64
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

--------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name
--------------------------------------------------------------
0/0   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
1/1   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
2/2   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
3/3   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
4/4   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
5/5   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
6/6   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
7/7   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
8/8   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
9/9   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
10/10 RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
11/11 RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
--------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

In Cache column, you see NRWBD for WriteBack and NRWTD for WriteThrough. On this case, all disks are in WriteBack and need to set to WriteThrough.

By default, ASM drops a disk shortly after it is taken offline; however, you can set the disk_repair_time attribute to prevent this operation by specifying a time interval to repair the disk and bring it back online. The default disk_repair_time attribute value of 3.6h should be adequate for most environments.

Let’s check repair time for ASM Disks. If you need increase it, change disk_repair_time attribute. Connected on ASM, do the follow:

[oracle@ra02dbadm01 ~]$ sqlplus

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Dec 3 16:05:51 2024
Version 19.23.0.0.0

Copyright (c) 1982, 2023, Oracle.  All rights reserved.

Enter user-name: /as sysdba

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> select dg.name,a.value from v$asm_diskgroup
dg, v$asm_attribute a where dg.group_number=a.group_number and
a.name='disk_repair_time';

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
CATALOG                        3.6h
DELTA                          3.6h

SQL> ALTER DISKGROUP CATALOG SET ATTRIBUTE 'DISK_REPAIR_TIME'='12.0h';

SQL> ALTER DISKGROUP DELTA SET ATTRIBUTE 'DISK_REPAIR_TIME'='12.0h';

SQL> select dg.name,a.value from v$asm_diskgroup
dg, v$asm_attribute a where dg.group_number=a.group_number and
a.name='disk_repair_time';

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
CATALOG                        12.0h
DELTA                          12.0h

SQL>

Note : As of version 12.1.0.1, there is a new diskgroup attribute, failgroup_repair_time, that governs the time a cell is allowed to be offline before its disks are dropped. This value defaults to 24 hours thus obviating the need to alter the disk_repair_time attribute as in older versions.

Next you will need to check if ASM will be OK if the grid disks go OFFLINE. The following command should return ‘Yes’ for the grid disks being listed:

[root@ra02celadm01 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         CATALOG_CD_00_ra02celadm01      ONLINE  Yes
         CATALOG_CD_01_ra02celadm01      ONLINE  Yes
         CATALOG_CD_02_ra02celadm01      ONLINE  Yes
         CATALOG_CD_03_ra02celadm01      ONLINE  Yes
         CATALOG_CD_04_ra02celadm01      ONLINE  Yes
         CATALOG_CD_05_ra02celadm01      ONLINE  Yes
         CATALOG_CD_06_ra02celadm01      ONLINE  Yes
         CATALOG_CD_07_ra02celadm01      ONLINE  Yes
         CATALOG_CD_08_ra02celadm01      ONLINE  Yes
         CATALOG_CD_09_ra02celadm01      ONLINE  Yes
         CATALOG_CD_10_ra02celadm01      ONLINE  Yes
         CATALOG_CD_11_ra02celadm01      ONLINE  Yes
         DELTA_CD_00_ra02celadm01        ONLINE  Yes
         DELTA_CD_01_ra02celadm01        ONLINE  Yes
         DELTA_CD_02_ra02celadm01        ONLINE  Yes
         DELTA_CD_03_ra02celadm01        ONLINE  Yes
         DELTA_CD_04_ra02celadm01        ONLINE  Yes
         DELTA_CD_05_ra02celadm01        ONLINE  Yes
         DELTA_CD_06_ra02celadm01        ONLINE  Yes
         DELTA_CD_07_ra02celadm01        ONLINE  Yes
         DELTA_CD_08_ra02celadm01        ONLINE  Yes
         DELTA_CD_09_ra02celadm01        ONLINE  Yes
         DELTA_CD_10_ra02celadm01        ONLINE  Yes
         DELTA_CD_11_ra02celadm01        ONLINE  Yes

If one or more disks does not return asmdeactivationoutcome=’Yes’, you should check the respective diskgroup and restore the data redundancy for that diskgroup.
Once the diskgroup data redundancy is fully restored, repeat the previous cellcli list command.

Once all disks return asmdeactivationoutcome=’Yes’, you can proceed with taking the griddisk offline. With root use on storage node execute the command below:

[root@ra02celadm01 ~]# cellcli -e alter griddisk all inactive
GridDisk CATALOG_CD_00_ra02celadm01 successfully altered
GridDisk CATALOG_CD_01_ra02celadm01 successfully altered
GridDisk CATALOG_CD_02_ra02celadm01 successfully altered
GridDisk CATALOG_CD_03_ra02celadm01 successfully altered
GridDisk CATALOG_CD_04_ra02celadm01 successfully altered
GridDisk CATALOG_CD_05_ra02celadm01 successfully altered
GridDisk CATALOG_CD_06_ra02celadm01 successfully altered
GridDisk CATALOG_CD_07_ra02celadm01 successfully altered
GridDisk CATALOG_CD_08_ra02celadm01 successfully altered
GridDisk CATALOG_CD_09_ra02celadm01 successfully altered
GridDisk CATALOG_CD_10_ra02celadm01 successfully altered
GridDisk CATALOG_CD_11_ra02celadm01 successfully altered
GridDisk DELTA_CD_00_ra02celadm01 successfully altered
GridDisk DELTA_CD_01_ra02celadm01 successfully altered
GridDisk DELTA_CD_02_ra02celadm01 successfully altered
GridDisk DELTA_CD_03_ra02celadm01 successfully altered
GridDisk DELTA_CD_04_ra02celadm01 successfully altered
GridDisk DELTA_CD_05_ra02celadm01 successfully altered
GridDisk DELTA_CD_06_ra02celadm01 successfully altered
GridDisk DELTA_CD_07_ra02celadm01 successfully altered
GridDisk DELTA_CD_08_ra02celadm01 successfully altered
GridDisk DELTA_CD_09_ra02celadm01 successfully altered
GridDisk DELTA_CD_10_ra02celadm01 successfully altered
GridDisk DELTA_CD_11_ra02celadm01 successfully altered

Note: This action could take 10 minutes or longer depending on activity. It is very important to make sure you were able to offline all the disks successfully before shutting down the cell services. Inactivating the grid disks will automatically OFFLINE the disks in the ASM instance.

Confirm that the griddisks are now offline by performing the following actions.

Execute the command below and the output should show either asmmodestatus=OFFLINE or asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM.

[root@ra02celadm01 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         CATALOG_CD_00_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_01_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_02_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_03_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_04_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_05_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_06_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_07_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_08_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_09_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_10_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_11_ra02celadm01      OFFLINE  Yes
         DELTA_CD_00_ra02celadm01        OFFLINE  Yes
         DELTA_CD_01_ra02celadm01        OFFLINE  Yes
         DELTA_CD_02_ra02celadm01        OFFLINE  Yes
         DELTA_CD_03_ra02celadm01        OFFLINE  Yes
         DELTA_CD_04_ra02celadm01        OFFLINE  Yes
         DELTA_CD_05_ra02celadm01        OFFLINE  Yes
         DELTA_CD_06_ra02celadm01        OFFLINE  Yes
         DELTA_CD_07_ra02celadm01        OFFLINE  Yes
         DELTA_CD_08_ra02celadm01        OFFLINE  Yes
         DELTA_CD_09_ra02celadm01        OFFLINE  Yes
         DELTA_CD_10_ra02celadm01        OFFLINE  Yes
         DELTA_CD_11_ra02celadm01        OFFLINE  Yes

And then, execute list griddisks to check inactivated disks:

[root@ra02celadm01 ~]# cellcli -e list griddisk
         CATALOG_CD_00_ra02celadm01      inactive
         CATALOG_CD_01_ra02celadm01      inactive
         CATALOG_CD_02_ra02celadm01      inactive
         CATALOG_CD_03_ra02celadm01      inactive
         CATALOG_CD_04_ra02celadm01      inactive
         CATALOG_CD_05_ra02celadm01      inactive
         CATALOG_CD_06_ra02celadm01      inactive
         CATALOG_CD_07_ra02celadm01      inactive
         CATALOG_CD_08_ra02celadm01      inactive
         CATALOG_CD_09_ra02celadm01      inactive
         CATALOG_CD_10_ra02celadm01      inactive
         CATALOG_CD_11_ra02celadm01      inactive
         DELTA_CD_00_ra02celadm01        inactive
         DELTA_CD_01_ra02celadm01        inactive
         DELTA_CD_02_ra02celadm01        inactive
         DELTA_CD_03_ra02celadm01        inactive
         DELTA_CD_04_ra02celadm01        inactive
         DELTA_CD_05_ra02celadm01        inactive
         DELTA_CD_06_ra02celadm01        inactive
         DELTA_CD_07_ra02celadm01        inactive
         DELTA_CD_08_ra02celadm01        inactive
         DELTA_CD_09_ra02celadm01        inactive
         DELTA_CD_10_ra02celadm01        inactive
         DELTA_CD_11_ra02celadm01        inactive

Revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the SuperCap occurs. As root user, set all logical volumes cache policy to WriteThrough cache mode:

[root@ra02celadm01 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall set wrcache=WT

Verify the current cache policy for all logical volumes is now WriteThrough:

[root@ra02celadm01 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show
CLI Version = 007.0530.0000.0000 Sep 21, 2018
Operating system = Linux 5.4.17-2136.330.7.5.el8uek.x86_64
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

--------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name
--------------------------------------------------------------
0/0   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
1/1   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
2/2   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
3/3   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
4/4   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
5/5   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
6/6   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
7/7   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
8/8   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
9/9   RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
10/10 RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
11/11 RAID0 Optl  RW     Yes     NRWTD -   ON  16.037 TB
--------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

In the volume table, the Cache column should report as “NRWTD” where WT indicates WriteThrough.

Once all disks are offline and inactive, and disks are in WriteThrough mode, the customer can shutdown the Cell using the following command:

shutdown -hP now

At this point, once storage node is down, technical engineer can execute the maintenance. After maintenance end, you can choose for technical engineer power on the server or enter in the iLOM to do that.

We will not cover this action on this article but this is a simple task with some clicks.

When storage server is up, you will need to put the disks in WriteBack again, if is not done automatically.

Verify there are no outstanding alerts in the Cell:

[root@ra02celadm01 ~]# cellcli -e list alerthistory
         1_1     2024-09-22T03:14:02-03:00       critical        "Configuration check discovered the following problems:   [Info]: ipconf command line: /opt/oracle.cellos/ipconf.pl -check-consistency -semantic-min -ignore-get-ilom-errors -nocodes [Info]: Verify that the configured values in the Exadata configuration file /opt/oracle.cellos/cell.conf agree with the actual values in use on this system At least one DNS server must be reachable                                                        : FAILED At least one NTP server must be reachable                                                        : FAILED [Info]: Consistency check FAILED"
         1_2     2024-09-24T15:59:58-03:00       clear           "The configuration check was successful."
         2_1     2024-11-21T22:39:18-03:00       critical        "The HDD disk controller battery has failed. All disk drives have been placed in WriteThrough caching mode. Disk write performance may be reduced. The flash drives are not affected. Battery Serial Number : 17031  Battery Type          : cvpm02  Battery Temperature   : 21 C  Pack Energy           : 431 Joule  Ambient Temperature   : 22 C"

Verify all the expected disk devices are present:

[root@ra02celadm01 ~]# lsscsi |grep MR
[0:2:0:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sda
[0:2:1:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdb
[0:2:2:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdc
[0:2:3:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdd
[0:2:4:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdf
[0:2:5:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sde
[0:2:6:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdh
[0:2:7:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdg
[0:2:8:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdi
[0:2:9:0]    disk    AVAGO    MT7341-16i       4.74  /dev/sdj
[0:2:10:0]   disk    AVAGO    MT7341-16i       4.74  /dev/sdk
[0:2:11:0]   disk    AVAGO    MT7341-16i       4.74  /dev/sdm

Verify the status of the Super Capacitor is visible and ‘Optimal’:

[root@ra02celadm01 ~]# /opt/MegaRAID/storcli/storcli64 /c0/cv show status
CLI Version = 007.0530.0000.0000 Sep 21, 2018
Operating system = Linux 5.4.17-2136.330.7.5.el8uek.x86_64
Controller = 0
Status = Success
Description = None


Cachevault_Info :
===============

--------------------
Property    Value
--------------------
Type        CVPM02
Temperature 21 C
State       Optimal
--------------------


Firmware_Status :
===============

---------------------------------------
Property                         Value
---------------------------------------
NVCache State                    OK
Replacement required             No
No space to cache offload        No
Module microcode update required No
---------------------------------------


GasGaugeStatus :
==============

------------------------------
Property                Value
------------------------------
Pack Energy             253 J
Capacitance             100 %
Remaining Reserve Space 0
------------------------------

Check disk status:

[root@ra02celadm01 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         CATALOG_CD_00_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_01_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_02_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_03_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_04_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_05_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_06_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_07_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_08_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_09_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_10_ra02celadm01      OFFLINE  Yes
         CATALOG_CD_11_ra02celadm01      OFFLINE  Yes
         DELTA_CD_00_ra02celadm01        OFFLINE  Yes
         DELTA_CD_01_ra02celadm01        OFFLINE  Yes
         DELTA_CD_02_ra02celadm01        OFFLINE  Yes
         DELTA_CD_03_ra02celadm01        OFFLINE  Yes
         DELTA_CD_04_ra02celadm01        OFFLINE  Yes
         DELTA_CD_05_ra02celadm01        OFFLINE  Yes
         DELTA_CD_06_ra02celadm01        OFFLINE  Yes
         DELTA_CD_07_ra02celadm01        OFFLINE  Yes
         DELTA_CD_08_ra02celadm01        OFFLINE  Yes
         DELTA_CD_09_ra02celadm01        OFFLINE  Yes
         DELTA_CD_10_ra02celadm01        OFFLINE  Yes
         DELTA_CD_11_ra02celadm01        OFFLINE  Yes

And then, execute list griddisks to check inactivated disks:

[root@ra02celadm01 ~]# cellcli -e list griddisk
         CATALOG_CD_00_ra02celadm01      inactive
         CATALOG_CD_01_ra02celadm01      inactive
         CATALOG_CD_02_ra02celadm01      inactive
         CATALOG_CD_03_ra02celadm01      inactive
         CATALOG_CD_04_ra02celadm01      inactive
         CATALOG_CD_05_ra02celadm01      inactive
         CATALOG_CD_06_ra02celadm01      inactive
         CATALOG_CD_07_ra02celadm01      inactive
         CATALOG_CD_08_ra02celadm01      inactive
         CATALOG_CD_09_ra02celadm01      inactive
         CATALOG_CD_10_ra02celadm01      inactive
         CATALOG_CD_11_ra02celadm01      inactive
         DELTA_CD_00_ra02celadm01        inactive
         DELTA_CD_01_ra02celadm01        inactive
         DELTA_CD_02_ra02celadm01        inactive
         DELTA_CD_03_ra02celadm01        inactive
         DELTA_CD_04_ra02celadm01        inactive
         DELTA_CD_05_ra02celadm01        inactive
         DELTA_CD_06_ra02celadm01        inactive
         DELTA_CD_07_ra02celadm01        inactive
         DELTA_CD_08_ra02celadm01        inactive
         DELTA_CD_09_ra02celadm01        inactive
         DELTA_CD_10_ra02celadm01        inactive
         DELTA_CD_11_ra02celadm01        inactive

To back disks to WriteBack mode, they have to keep inactive.

Set all logical drives cache policy to WriteBack cache mode:

/opt/MegaRAID/storcli/storcli64 /c0/vall set wrcache=WB

Verify the current cache policy for all logical volumes is now WriteBack:

[root@ra02celadm01 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show
CLI Version = 007.0530.0000.0000 Sep 21, 2018
Operating system = Linux 5.4.17-2136.330.7.5.el8uek.x86_64
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

--------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name
--------------------------------------------------------------
0/0   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
1/1   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
2/2   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
3/3   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
4/4   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
5/5   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
6/6   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
7/7   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
8/8   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
9/9   RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
10/10 RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
11/11 RAID0 Optl  RW     Yes     NRWBD -   ON  16.037 TB
--------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

In the volume table, the Cache column should report as “NRWBD” where WB indicates WriteBack.

Verify there are no outstanding alerts in the Cell:

[root@ra02celadm01 ~]# cellcli -e list alerthistory
         1_1     2024-09-22T03:14:02-03:00       critical        "Configuration check discovered the following problems:   [Info]: ipconf command line: /opt/oracle.cellos/ipconf.pl -check-consistency -semantic-min -ignore-get-ilom-errors -nocodes [Info]: Verify that the configured values in the Exadata configuration file /opt/oracle.cellos/cell.conf agree with the actual values in use on this system At least one DNS server must be reachable                                                        : FAILED At least one NTP server must be reachable                                                        : FAILED [Info]: Consistency check FAILED"
         1_2     2024-09-24T15:59:58-03:00       clear           "The configuration check was successful."
         2_1     2024-11-21T22:39:18-03:00       critical        "The HDD disk controller battery has failed. All disk drives have been placed in WriteThrough caching mode. Disk write performance may be reduced. The flash drives are not affected. Battery Serial Number : 17031  Battery Type          : cvpm02  Battery Temperature   : 21 C  Pack Energy           : 431 Joule  Ambient Temperature   : 22 C"
         2_2     2024-12-03T19:58:46-03:00       info            "The HDD disk controller battery was replaced.  Battery Serial Number : 13487  Battery Type          : cvpm02  Battery Temperature   : 18 C  Pack Energy           : 238 Joule  Ambient Temperature   : 21 C"
         2_3     2024-12-03T20:05:49-03:00       clear           "All disk drives are in WriteBack caching mode.  Battery Serial Number : 13487  Battery Type          : cvpm02  Battery Temperature   : 20 C  Pack Energy           : 436 Joule  Ambient Temperature   : 19 C"

Reactive the griddisks:

[root@ra02celadm01 ~]# cellcli -e alter griddisk all active
GridDisk CATALOG_CD_00_ra02celadm01 successfully altered
GridDisk CATALOG_CD_01_ra02celadm01 successfully altered
GridDisk CATALOG_CD_02_ra02celadm01 successfully altered
GridDisk CATALOG_CD_03_ra02celadm01 successfully altered
GridDisk CATALOG_CD_04_ra02celadm01 successfully altered
GridDisk CATALOG_CD_05_ra02celadm01 successfully altered
GridDisk CATALOG_CD_06_ra02celadm01 successfully altered
GridDisk CATALOG_CD_07_ra02celadm01 successfully altered
GridDisk CATALOG_CD_08_ra02celadm01 successfully altered
GridDisk CATALOG_CD_09_ra02celadm01 successfully altered
GridDisk CATALOG_CD_10_ra02celadm01 successfully altered
GridDisk CATALOG_CD_11_ra02celadm01 successfully altered
GridDisk DELTA_CD_00_ra02celadm01 successfully altered
GridDisk DELTA_CD_01_ra02celadm01 successfully altered
GridDisk DELTA_CD_02_ra02celadm01 successfully altered
GridDisk DELTA_CD_03_ra02celadm01 successfully altered
GridDisk DELTA_CD_04_ra02celadm01 successfully altered
GridDisk DELTA_CD_05_ra02celadm01 successfully altered
GridDisk DELTA_CD_06_ra02celadm01 successfully altered
GridDisk DELTA_CD_07_ra02celadm01 successfully altered
GridDisk DELTA_CD_08_ra02celadm01 successfully altered
GridDisk DELTA_CD_09_ra02celadm01 successfully altered
GridDisk DELTA_CD_10_ra02celadm01 successfully altered
GridDisk DELTA_CD_11_ra02celadm01 successfully altered

Issue the command below and all disks should show active:

[root@ra02celadm01 ~]# cellcli -e list griddisk
         CATALOG_CD_00_ra02celadm01      active
         CATALOG_CD_01_ra02celadm01      active
         CATALOG_CD_02_ra02celadm01      active
         CATALOG_CD_03_ra02celadm01      active
         CATALOG_CD_04_ra02celadm01      active
         CATALOG_CD_05_ra02celadm01      active
         CATALOG_CD_06_ra02celadm01      active
         CATALOG_CD_07_ra02celadm01      active
         CATALOG_CD_08_ra02celadm01      active
         CATALOG_CD_09_ra02celadm01      active
         CATALOG_CD_10_ra02celadm01      active
         CATALOG_CD_11_ra02celadm01      active
         DELTA_CD_00_ra02celadm01        active
         DELTA_CD_01_ra02celadm01        active
         DELTA_CD_02_ra02celadm01        active
         DELTA_CD_03_ra02celadm01        active
         DELTA_CD_04_ra02celadm01        active
         DELTA_CD_05_ra02celadm01        active
         DELTA_CD_06_ra02celadm01        active
         DELTA_CD_07_ra02celadm01        active
         DELTA_CD_08_ra02celadm01        active
         DELTA_CD_09_ra02celadm01        active
         DELTA_CD_10_ra02celadm01        active
         DELTA_CD_11_ra02celadm01        active

Verify all grid disks have been successfully put online using the following command:

[root@ra02celadm01 ~]# cellcli -e list griddisk attributes name, asmmodestatus
         CATALOG_CD_00_ra02celadm01      ONLINE
         CATALOG_CD_01_ra02celadm01      ONLINE
         CATALOG_CD_02_ra02celadm01      ONLINE
         CATALOG_CD_03_ra02celadm01      ONLINE
         CATALOG_CD_04_ra02celadm01      ONLINE
         CATALOG_CD_05_ra02celadm01      ONLINE
         CATALOG_CD_06_ra02celadm01      ONLINE
         CATALOG_CD_07_ra02celadm01      ONLINE
         CATALOG_CD_08_ra02celadm01      ONLINE
         CATALOG_CD_09_ra02celadm01      ONLINE
         CATALOG_CD_10_ra02celadm01      ONLINE
         CATALOG_CD_11_ra02celadm01      ONLINE
         DELTA_CD_00_ra02celadm01        ONLINE
         DELTA_CD_01_ra02celadm01        ONLINE
         DELTA_CD_02_ra02celadm01        ONLINE
         DELTA_CD_03_ra02celadm01        ONLINE
         DELTA_CD_04_ra02celadm01        ONLINE
         DELTA_CD_05_ra02celadm01        ONLINE
         DELTA_CD_06_ra02celadm01        ONLINE
         DELTA_CD_07_ra02celadm01        ONLINE
         DELTA_CD_08_ra02celadm01        ONLINE
         DELTA_CD_09_ra02celadm01        ONLINE
         DELTA_CD_10_ra02celadm01        ONLINE
         DELTA_CD_11_ra02celadm01        ONLINE

Some disks can show in SYNCING mode, this meaning that your disksgroup is doing a rebalance, is normal and can take a long time to finish.

ASM rebalance phase will start automatically after disk resync phase completes if using GI version 12.1 or higher.

Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.

Note: this operation uses Fast Mirror Resync operation – which does not trigger an ASM rebalance if using GI version < 12.1. The Resync operation restores only the extents that would have been written while the disk was offline.

…and that is it.


Deixe um comentário