Postmortem — Fixing wordpress

Issue summary:

On September 30, 2021 a report was received that our wordpress site had stopped working, the failure was reported at 9:00 am (-5 UTC) and could be resolved an hour later at 10:00 am (-5 UTC). The server returned a 500 status code. The cause of the problem was a type when calling the configuration file, but it could be solved by tracing the behavior of our server step by step with the tool called strace. Our web application remained inactive for an hour affecting 100% of our users, due to having a single server, this being our single point of failure (SPOF).
Timeline:

September 30 9:00 am (-5 UTC).: An error was reported on our wordpress page.
September 30 9:15 am (-5 UTC): The source of the error was searched in the log but no clue was found.
September 30 9:20 am (-5 UTC): The activated status of the server was verified but it was found that the server was active but returned a 500 status code.
root@e514b399d69d:~# curl -sI 127.0.0.1
HTTP/1.0 500 Internal Server Error
Date: Fri, 24 Mar 2017 07:32:16 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.21
Connection: close
Content-Type: text/html
September 30 9:25 am (-5 UTC): Using the “ps” command we list the active processes to be able to trace the process of our Apache server in a separate terminal and we find a parent process and a child process.
September 30 9:30 am (-5 UTC): In a new terminal we traced the server process with the “strace” tool with the parent process but no relevant information was obtained.
September 30 9:35 am (-5 UTC): The child process was traced with the “strace” tool, in parallel in another terminal the “curl” command was used to review the response from the server and a record of all the calls made to the server could be obtained.
September 30 9:45 am (-5 UTC): We could find a call to a file descriptor that returned “-1” indicating an error, and we could see that it called a file with the extension “.phpp”.
September 30 9:50 am (-5 UTC): We searched with the “grep” command inside the “/var/html/www/” folder for a file that contains that extension inside it and the configuration file was found.
September 30 10:00 am (-5 UTC): A puppet manifest was made to fix the typo and the server was able to function normally.
root@e514b399d69d:~# puppet apply fix-wordpress.pp
Notice: Compiled catalog for e514b399d69d.ec2.internal in environment production in 0.02 seconds
Notice: /Stage[main]/Main/Exec[fix-wordpress]/returns: executed successfully
Notice: Finished catalog run in 0.08 seconds
root@e514b399d69d:~# curl -sI 127.0.0.1:80
root@e514b399d69d:~#
HTTP/1.1 200 OK
Date: Fri, 24 Mar 2017 07:11:52 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.21
Link: <http://127.0.0.1/?rest_route=/>; rel="https://api.w.org/"
Content-Type: text/html; charset=UTF-8
Root cause and resolution:

The cause of the failure was a typo in the file called “wp-settings.php” calling the configuration file “class-wp-locale.phpp”.
The bug could be fixed by correcting the file name to “class-wp-locale.php” using the following puppet manifest:
# fix the phpp typo
exec { ‘fix-wordpress’:
command => ‘sudo sed -i s/class-wp-locale.phpp/class-wp-locale.php/g /var/www/html/wp-settings.php’,
path => [‘/usr/bin’, ‘/usr/sbin’,],
}
Corrective and preventative measures:
The use of syntax checkers could be implemented in our workflow as well as the implementation of a second server to avoid SPOF.