Jump to content
  • Posts

    1,201
  • Joined

  • Last visited

Options 0 Followers

No followers

Contact Methods

Profile Information

  • Location
    Cambridge, MA

Recent Profile Visitors

4,559 profile views

Your Achievements

Write a public message on your own feed...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  1. find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2310 Ugh add -exec rm {} + to delete (WITHOUT the wc -l) I'm going to try to rerun now, there's been some time passed since I last did it.
  2. They're *older* so we don't know their LDL when younger, but high-risk people are at way higher risk of actual heart attacks/arteriosclerosis..
  3. I'm asking someone if they could properly scrape crsociety with the proper use of multiple proxies to bypass captchas. There might be a price associated with it - we'll still trying to figure this out.. == archivebot still running.. https://www.crsociety.org/ on 10-28; 174,325.9 MB in 536,936 resp. at 0.7/s, 264,419 in q.; 1 con. w/ 1000 ms delay; igoff == if u do search by user, try https://www.crsociety.org/profile/5068-alex-k-chen/content/page/43/?type=forums_topic_post [page 1 to 43...]
  4. Mileage also shows large LDL particles really decrease mortality
  5. https://www.biorxiv.org/content/10.1101/2024.10.22.619522v1.full
  6. If someone could help me run httrack or wget on the site, that would be greatly appreciated!! I've included the urls.txt file. cookies.txt is not absolutely necessary but register for an account and login, use claude to convert cookies.sqlite to cookies.txt, and try it here. https://content.invisioncic.com/h253353/monthly_2022_11/image.png.33d7d1fc9f204905918b10aee0560c7e.png ^here's a sample link to an image that needs to be included here are the options: https://www.httrack.com/html/fcguide.html urls.txt
  7. let's just try this: and what if the process terminates: httrack refused to run after some time on DOCN yesterday, I wonder if it has to do with the captcha issue...
  8. My biggest fear find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2255 :/mnt/c/My Web Sites/crsociety2$ find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2299 == 938 MB now when I update this, it INCREASES the number of sites with this error message, fuck, I have to use a proxy or smg there's cyotek which I can try. I just got a new VPN but httrack's latest version doesn't even include "don't update already existing files"...
  9. managed to download A LOT of the pages within one day (and mostly preserve site organization), HOWEVER, a number of pages [2296 of them] have this error message in them (i suspect all pages with an updated date after a certain time): so I have to re-run the scraper to include external images [+*.content.invisioncic.com/*] *and *imgur.com*and bulk-delete all pages that have this thing above, and then re-run the scraper a bit less aggressively. ==== and figure out a way to make the scraper take in cookies (I know I once got it to take in cookies but cookies have gotten more complicated since httrack was last updated). also want to see if this will check threads (LIKE THIS ONE) for updates each time I run the scraper.. [and then figure out a place to upload it to just in case crsociety.org goes down for good - HOPEFULLY IT WON'T] god, invisionforum is such great software, it's better for organizing my thinking than any other, it makes me wonder if I should put one in a DOCN droplet. == if i set active connections to 2, it slows the scraper down to a damn crawl, damn, I need to increase it (but not to 10 at a time)
  10. Well how old is the apartment you live in? Boston got rid of most of its lead in its pipes, I think
  11. 68lb right/64 lb left. was 30kg in late 2021 [practically the same] I’m in between males and females [I may have low testosterone/DHT signalling which would make sense] on 2020-11-29, it was 25.5 kg, 58-60lbs at max God I had measurements earlier I can’t find.
  12. it may be necessary to remove -*reportComment* and -*findComment* b/c these links force httrack to go over way way more URLs... and -*getLastComment === https://poe.com/s/yer6HYKwjsFR6G1t6AXT check https://www.archivebot.com/ === AND *&tab=comment* [damnit have to restart again] so now scan rules are now there are a bunch of links with */tags/* in them, which might explode the number of possible links, idk. Whatever, running httrack remotely wouldn't have worked b/c I had to inspect which links crsociety was getting stuck on. Invisionboard is complex enough that it has all these extraneous links that clog up httrack [which I haven't used in years]...
  13. I'll just do it on my own PC, winhttrack with depth = 3 [though depth=2 is much faster] and external-deoth=1. Maybe this will be quick enough archivebot is still running https://www.crsociety.org/ on 10-28; 22,145.0 MB in 130,046 resp. at 0.7/s, 370,562 in q.; 1 con. w/ 1000 ms delay; igoff9rrxic89n15t4tqb0f9qwaa8m 302 Connection closed. http://sci-hub.cc/10.1093/ije/dyw319 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1093/ije/dyw319’ encountered an error: Connection closed. 301 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 403 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 302 OK http://sci-hub.cc/10.1002/mnfr.201400446 302 Connection closed. http://sci-hub.cc/10.1002/mnfr.201400446 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1002/mnfr.201400446’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/10.1080/21551197.2017.1299659 302 Connection closed. http://sci-hub.cc/10.1080/21551197.2017.1299659 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1080/21551197.2017.1299659’ encountered an error: Connection closed. 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 200 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 302 OK http://sci-hub.cc/10.3945/ajcn.117.154294 302 Connection closed. http://sci-hub.cc/10.3945/ajcn.117.154294 ERROR Fetching ‘http://ww99.sci-hub.cc/10.3945/ajcn.117.154294’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/doi/10.3945/an.116.014431 302 Connection closed. http://sci-hub.cc/doi/10.3945/an.116.014431 ERROR Fetching ‘http://ww99.sci-hub.cc/doi/10.3945/an.116.014431’ encountered an error: Connection closed. 404 OK http://jrms.mui.ac.ir/files/journals/1/articles/10516/public/10516-39461-1-PB.pdf 301 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://www.onlinejacc.org/content/69/9/1116 301 OK http://www.onlinejacc.org/content/69/9/1116 403 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://sci-hub.cc/10.1111/ger.12265 302 Connection closed. http://sci-hub.cc/10.1111/ger.12265 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1111/ger.12265’ encountered an error: Connection closed. 301 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 200 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 200 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular
  14. and it took forever.. and i figured out it had to do with the ext-depth not being limited to 1. and then I kept on trying to restart httrack and to no avail even after rebooting the unix system, damnit. nohup httrack --depth=2 --ext-depth=1 --path "./websites" \ --robots=0 --keep-alive \ --cookies=httrack_cookies.txt \ --mirror \ -%v \ -iC8 \ --timeout=60 \ --retries=3 \ -O "./websites" \ --file-log \ --error-log=httrack_errors.log \ -%L "urls.txt" \ "+*.content.invisioncic.com/*" \ "+*crsociety.org/*" \ "+*www.crsociety.org/*" \ > httrack.log 2>&1 & echo $! > httrack.pid
×
×