-
Posts
1,201 -
Joined
-
Last visited
Options
-
Allow others to follow me
-
Don't allow others to follow me
-
When other users follow you, they will be notified when you post new content
0 Followers
No followers
Contact Methods
- Website URL
Profile Information
-
Location
Cambridge, MA
Your Achievements
-
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2310 Ugh add -exec rm {} + to delete (WITHOUT the wc -l) I'm going to try to rerun now, there's been some time passed since I last did it. -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
I'm asking someone if they could properly scrape crsociety with the proper use of multiple proxies to bypass captchas. There might be a price associated with it - we'll still trying to figure this out.. == archivebot still running.. https://www.crsociety.org/ on 10-28; 174,325.9 MB in 536,936 resp. at 0.7/s, 264,419 in q.; 1 con. w/ 1000 ms delay; igoff == if u do search by user, try https://www.crsociety.org/profile/5068-alex-k-chen/content/page/43/?type=forums_topic_post [page 1 to 43...] -
Metformin - yet another u-turn?
Alex K Chen replied to TomBAvoider's topic in General Health and Longevity
https://www.biorxiv.org/content/10.1101/2024.10.22.619522v1.full -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
If someone could help me run httrack or wget on the site, that would be greatly appreciated!! I've included the urls.txt file. cookies.txt is not absolutely necessary but register for an account and login, use claude to convert cookies.sqlite to cookies.txt, and try it here. https://content.invisioncic.com/h253353/monthly_2022_11/image.png.33d7d1fc9f204905918b10aee0560c7e.png ^here's a sample link to an image that needs to be included here are the options: https://www.httrack.com/html/fcguide.html urls.txt -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
let's just try this: and what if the process terminates: httrack refused to run after some time on DOCN yesterday, I wonder if it has to do with the captcha issue... -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
My biggest fear find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2255 :/mnt/c/My Web Sites/crsociety2$ find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2299 == 938 MB now when I update this, it INCREASES the number of sites with this error message, fuck, I have to use a proxy or smg there's cyotek which I can try. I just got a new VPN but httrack's latest version doesn't even include "don't update already existing files"... -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
managed to download A LOT of the pages within one day (and mostly preserve site organization), HOWEVER, a number of pages [2296 of them] have this error message in them (i suspect all pages with an updated date after a certain time): so I have to re-run the scraper to include external images [+*.content.invisioncic.com/*] *and *imgur.com*and bulk-delete all pages that have this thing above, and then re-run the scraper a bit less aggressively. ==== and figure out a way to make the scraper take in cookies (I know I once got it to take in cookies but cookies have gotten more complicated since httrack was last updated). also want to see if this will check threads (LIKE THIS ONE) for updates each time I run the scraper.. [and then figure out a place to upload it to just in case crsociety.org goes down for good - HOPEFULLY IT WON'T] god, invisionforum is such great software, it's better for organizing my thinking than any other, it makes me wonder if I should put one in a DOCN droplet. == if i set active connections to 2, it slows the scraper down to a damn crawl, damn, I need to increase it (but not to 10 at a time) -
68lb right/64 lb left. was 30kg in late 2021 [practically the same] I’m in between males and females [I may have low testosterone/DHT signalling which would make sense] on 2020-11-29, it was 25.5 kg, 58-60lbs at max God I had measurements earlier I can’t find.
-
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
it may be necessary to remove -*reportComment* and -*findComment* b/c these links force httrack to go over way way more URLs... and -*getLastComment === https://poe.com/s/yer6HYKwjsFR6G1t6AXT check https://www.archivebot.com/ === AND *&tab=comment* [damnit have to restart again] so now scan rules are now there are a bunch of links with */tags/* in them, which might explode the number of possible links, idk. Whatever, running httrack remotely wouldn't have worked b/c I had to inspect which links crsociety was getting stuck on. Invisionboard is complex enough that it has all these extraneous links that clog up httrack [which I haven't used in years]... -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
I'll just do it on my own PC, winhttrack with depth = 3 [though depth=2 is much faster] and external-deoth=1. Maybe this will be quick enough archivebot is still running https://www.crsociety.org/ on 10-28; 22,145.0 MB in 130,046 resp. at 0.7/s, 370,562 in q.; 1 con. w/ 1000 ms delay; igoff9rrxic89n15t4tqb0f9qwaa8m 302 Connection closed. http://sci-hub.cc/10.1093/ije/dyw319 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1093/ije/dyw319’ encountered an error: Connection closed. 301 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 403 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 302 OK http://sci-hub.cc/10.1002/mnfr.201400446 302 Connection closed. http://sci-hub.cc/10.1002/mnfr.201400446 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1002/mnfr.201400446’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/10.1080/21551197.2017.1299659 302 Connection closed. http://sci-hub.cc/10.1080/21551197.2017.1299659 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1080/21551197.2017.1299659’ encountered an error: Connection closed. 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 200 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 302 OK http://sci-hub.cc/10.3945/ajcn.117.154294 302 Connection closed. http://sci-hub.cc/10.3945/ajcn.117.154294 ERROR Fetching ‘http://ww99.sci-hub.cc/10.3945/ajcn.117.154294’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/doi/10.3945/an.116.014431 302 Connection closed. http://sci-hub.cc/doi/10.3945/an.116.014431 ERROR Fetching ‘http://ww99.sci-hub.cc/doi/10.3945/an.116.014431’ encountered an error: Connection closed. 404 OK http://jrms.mui.ac.ir/files/journals/1/articles/10516/public/10516-39461-1-PB.pdf 301 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://www.onlinejacc.org/content/69/9/1116 301 OK http://www.onlinejacc.org/content/69/9/1116 403 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://sci-hub.cc/10.1111/ger.12265 302 Connection closed. http://sci-hub.cc/10.1111/ger.12265 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1111/ger.12265’ encountered an error: Connection closed. 301 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 200 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 200 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular -
crsociety.org FINALLY got back online after 4 months
Alex K Chen replied to Alex K Chen's topic in Chitchat
and it took forever.. and i figured out it had to do with the ext-depth not being limited to 1. and then I kept on trying to restart httrack and to no avail even after rebooting the unix system, damnit. nohup httrack --depth=2 --ext-depth=1 --path "./websites" \ --robots=0 --keep-alive \ --cookies=httrack_cookies.txt \ --mirror \ -%v \ -iC8 \ --timeout=60 \ --retries=3 \ -O "./websites" \ --file-log \ --error-log=httrack_errors.log \ -%L "urls.txt" \ "+*.content.invisioncic.com/*" \ "+*crsociety.org/*" \ "+*www.crsociety.org/*" \ > httrack.log 2>&1 & echo $! > httrack.pid
