You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

5.4 KiB

Site Size Rechecker Script Documentation

Purpose

The main purpose of site_size_rechecker.py script is to automate the checking of older sites that are listed in the 512kb.club and ensuring that the size is updated.

How it works

  1. Read sites.yml file
  2. Analyze last_checked key pair
    1. Sort key pair in ascending order earliest first
    2. Non-date values are listed before dated values such as "N/A"

Requirements

Installation

  1. Create an account with GTmetrix.com
    1. Go to account settings and generate an API Key
  2. Install ruamel.yaml Python library (available via pip or a package manager).
  3. Install python-gtmetrix.
    1. It is recommened to just Git clone the repo as they only require requests which have pip and an OS package.
  4. Create authintication file named myauth.py with the following format:
    email='email@example.com'
    api_key='96bcab1060d838723701010387159086'
    
    1. email: is the one used in creating a GTmetrix account
    2. api_key: is what was generated in step 1.1
  5. Copy the site_size_rechecker.py and myauth.py into the python-gtmetrix cloned in step 3

Note: Under the new plan you will first receive 100 credits from GTmetrix for testing. after which you will get a refill of 10 credits every day at 8:45 PM +0000. This script uses 0.7 credits for each site check. which is about 14 site reports per day per person

Usage

while in the python-gtmetrix folder run:

python script2.py ../512kb.club/_data/sites.yml XY

Note: XY stands for the number of sites to be checked

Successful Output

Successful output will generate a table in markdown file which Must be put in the PR such as #450

Site | old size (team) | new size (team) | delta (%) | GTmetrix | note
---- | --------------- | --------------- | --------- | -------- | ----
[docs.j7k6.org](https://docs.j7k6.org/) | 73.0kb (green) | 72.9kb (green) | -0.1kb (-0%) | [report](https://GTmetrix.com/reports/docs.j7k6.org/PkIra4ns/#waterfall) |

Note: In the middle of each line it takes about 30 seconds in wait-time to output the rest of the line. This is due to the time it takes to finish the GTmetrix scan

This can be beneficial to know if a site has a problem that can be used to check the site or remove it from the checking.

If everything goes right, you should get a table-like output which you can just paste into Github PR:

Note that it "hangs" for about 30 seconds in the middle of each line except the first two, as it first prints site name and old size, then waits for GTmetrix scan to finish, and after that prints new size and rest of the line.

This is done so if the script encounters an issue when running GTmetrix scan, you know which site it happened with, and can either check it manually or exclude the site from checking.

Fine-tuning

Wait-time

To decrease waiting time, edit the python-gtmetrix/gtmetrix/interface.py file and change the number 30 in line 85 to a smaller number - for example, change this line from

time.sleep(30)

to

time.sleep(3)

This will decrease the delay between each check when the script is waiting for the GTmetrix scan to finish.

The recommended poll interval is 1 second. I suggest setting it to 3 seconds. By default in interface.py file is set to 30 seconds.

Excluding site from checks

To exclude a site from checks you can either remove the site or change the last_checked Key-Pair to today's date or a date in the future to make it last in the list.

Troubleshooting

In case you encounter an issue with this script open a New Issue and tagging @Lex-2008

Please provide as much information as possible such as:

  • All Output
  • Current state of sites.yml if it's from the master branch, or has been modified

To debug why the script "hangs" when checking some site, edit the python-gtmetrix/gtmetrix/interface.py file and a new 87th line which would looke like this:

Orginal file

response_data = self._request(self.poll_state_url)
self.state = response_data['state']

Edited file

response_data = self._request(self.poll_state_url)
print(response_data)
self.state = response_data['state']

This will break the nicely formatted table output, but you will see the raw response from GTmetrix API.

{'resources': {}, 'error': 'An error occurred fetching the page: HTTPS error: hostname verification failed', 'results': {}, 'state': 'error'}

Future plans

Currently, this script doesn't check any errors returned by GTmetrix.com API. That's the next item on my list. Moreover, I will get rid of python-GTmetrix dependency, since it adds more troubles than benefits.