Tomalla
Designer
General Modder
Posts: 525
|
Post by Tomalla on Oct 20, 2018 14:22:11 GMT -8
|
|
|
|
|
Tomalla
Designer
General Modder
Posts: 525
|
Post by Tomalla on Oct 22, 2018 11:47:48 GMT -8
My crawler has found 8376 links total. This includes links in typical anchor tags (like: <a href="link.html>...</a>) as well as image source URLs (like: </img src="fancy.png">). There's one caveat however when it comes to the dead links I've listed above. There are more dead links in the manual. The crawler ignores the links that it already visited and does not visit them more than once (otherwise it would immediately fall into a crawling loop having visited the pages which linked to themselves). I mistakenly relied on this mechanism to count the dead links. What I actually counted in fact is the amount of unique 404 URLs that happen to be in the manual. So multiple occurrences of the same "dead URL" would still count as one. My bad, sorry for that. I've corrected the crawler and run it on the same old manual I had at the very beginning (so your most recent fixes won't be visible here) and it turns out there are not 28 dead links in the manual, but 94. Way too many to list them here. That's why I've posted a sheet with all the data here: docs.google.com/spreadsheets/d/1YVkibnkz5uvaa6NO50Po9zPOjRJTcEPzIyrZ3oDILQo/edit?usp=sharingThere are three columns there: the path to the file where the dead link is, the actual URL of the 404 resource and the HTML tag which has the dead URL. You can make a copy of the document or download it and make your own amendments if you wish. And responding to your question: 94 dead links would make the correctness rate 98.88%
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 22, 2018 11:53:43 GMT -8
My crawler has found 8376 links total. This includes links in typical anchor tags (like: <a href="link.html>...</a>) as well as image source URLs (like: </img src="fancy.png">). There's one caveat however when it comes to the dead links I've listed above. There are more dead links in the manual. The crawler ignores the links that it already visited and does not visit them more than once (otherwise it would immediately fall into a crawling loop having visited the pages which linked to themselves). I mistakenly relied on this mechanism to count the dead links. What I actually counted in fact is the amount of unique 404 URLs that happen to be in the manual. So multiple occurrences of the same "dead URL" would still count as one. My bad, sorry for that. I've corrected the crawler and run it on the same old manual I had at the very beginning (so your most recent fixes won't be visible here) and it turns out there are not 28 dead links in the manual, but 94. Way too many to list them here. That's why I've posted a sheet with all the data here: docs.google.com/spreadsheets/d/1YVkibnkz5uvaa6NO50Po9zPOjRJTcEPzIyrZ3oDILQo/edit?usp=sharingThere are three columns there: the path to the file where the dead link is, the actual URL of the 404 resource and the HTML tag which has the dead URL. You can make a copy of the document or download it and make your own amendments if you wish. And responding to your question: 94 dead links would make the correctness rate 98.88% I am working my way through the original post, and make web storage updates as I go along. I am late for lunch (just looked at the timepiece) so will look at the expanded list after eating.
Edit: 4:55PM breaking for dinner ... not sure if I will resume today afterward.
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 23, 2018 9:55:56 GMT -8
Many of the broken links share the same cause: too many (or too few) ../ strings to refer to an element outside of the folder in which the flawed element resides.
Unfortunately (for me) many of the affected elements have been stripped of the carriage return/line feed characters that are used to indent HTML elements. Being very much OCD relating to programming technique, I spend much of the correction time in simply restoring the desired indentation presentation. The affected elements have shared one thing in common ... a string of 8 (apparently) blank characters before the <title> element, which I believe are actually controls that some program uses to restore indentation within a browser. With the CR/LFs stripped, incredibly longs strings are generated ... at least 4,096 characters long ... much longer than can be displayed even in today's screen resolutions!
When I was working on Gaby's index of Custom Levelz (which had the CR/LFs removed), a change in the length of an early string's length resulted in the display of the entire element(!) being shifted, which makes me think that the limit of the string length is much greater than 4,096 characters.
So please bear with me concerning the amount of time it is taking to resolve these broken links.
|
|
Tomalla
Designer
General Modder
Posts: 525
|
Post by Tomalla on Oct 23, 2018 10:35:11 GMT -8
Many of the broken links share the same cause: too many (or too few) ../ strings to refer to an element outside of the folder in which the flawed element resides.
Unfortunately (for me) many of the affected elements have been stripped of the carriage return/line feed characters that are used to indent HTML elements. Being very much OCD relating to programming technique, I spend much of the correction time in simply restoring the desired indentation presentation. The affected elements have shared one thing in common ... a string of 8 (apparently) blank characters before the <title> element, which I believe are actually controls that some program uses to restore indentation within a browser. With the CR/LFs stripped, incredibly longs strings are generated ... at least 4,096 characters long ... much longer than can be displayed even in today's screen resolutions!
When I was working on Gaby's index of Custom Levelz (which had the CR/LFs removed), a change in the length of an early string's length resulted in the display of the entire element(!) being shifted, which makes me think that the limit of the string length is much greater than 4,096 characters.
So please bear with me concerning the amount of time it is taking to resolve these broken links. It looks like every *.html file HAS the new line characters and in fact - IS split into lines. The problem is that you decode those characters wrong. I bet you are using Windows Notepad or the like, aren't you? The problem with the files is that more often than not the new line character used is CR (0x0D or 13 in decimal) which is rarely used - maybe by some Mac OSes. It's not the native way to mark a new line on Windows (where CR LF (0x0D 0x0A or 13 10 in decimal) combination is used) and so the classic Notepad application will not decode them properly resulting in long lines. If you value your sanity to any extent please, DO NOT attempt to manually correct all these files. Instead open them in a decent text editor (the best free alternative would be Notepad++) and work with the files from there. Or if you really want to use your original text editor, CONVERT the file to the proper format with Windows-supported CRLF line endings beforehand.
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 23, 2018 11:03:57 GMT -8
Downloaded and installed NotePad++ and used it on one of the Appendix C elements. It works great! That will save me a LOT of time!!! One difference between NotePad and NotePad++ is the location of the Replace menu item. In ++ it is buried a bit deeper, but I can live with that.
Thank You!
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 23, 2018 13:29:20 GMT -8
NotePad++ is definitely superior to NotePad!
Since it recognizes the CR/LF combination, it also allows me to combine lines easily. Plus, it has a working "replace all in all opened documents", so the repeated error (as in the Appendix C elements) was fixed in one operation.
It took some getting used to, but I now also appreciate the 'pairing' of tags. It allows me to see immediately unpaired tags, and fix them. And non-significant characters, as an extra > in an ending tag also are effectively displayed in a neon light ... glaring.
Progress!
|
|
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 25, 2018 10:18:05 GMT -8
A new web crawler review of the GLEM is in order. I think I have covered all of the broken linkz, but if the web crawler is avoiding possible recursive passes ... there may still be some corrections required.
Edit: And I may have caused broken linkz because I deleted three elements from the web host regarding Designerz and Collaboratorz. The web crawler would identify those elements still referring to those lists ... which were a pain to maintain. I doubt anyone cared about the difficulty levels being displayed in one place for some 60+ people anyway.
|
|
Tomalla
Designer
General Modder
Posts: 525
|
Post by Tomalla on Oct 28, 2018 4:07:41 GMT -8
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 28, 2018 10:07:22 GMT -8
Corrected the Tile element today. The Switchez was just a matter of I did not re-upload the file to the web host.
There should not be any broken links in the Gruntz Level Editor Manual now.
|
|
swietymiki
Global Moderator
Posts: 683
Member is Online
|
Post by swietymiki on Oct 28, 2018 12:01:01 GMT -8
Corrected the Tile element today. The Switchez was just a matter of I did not re-upload the file to the web host.
There should not be any broken links in the Gruntz Level Editor Manual now. The link to Switchez still doesn't work because it's missing the .html extension
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Oct 28, 2018 20:27:02 GMT -8
Corrected the Tile element today. The Switchez was just a matter of I did not re-upload the file to the web host.
There should not be any broken links in the Gruntz Level Editor Manual now. The link to Switchez still doesn't work because it's missing the .html extension Try it now. I also updated some information in another Tilez folder, relating (obtusely) to the DisGruntled concept.
|
|