Alex K Chen Posted October 9 Report Share Posted October 9 (edited) What happened in the interim? Can we get a proper backup so we don't risk losing all our content? [httrack works wonderfully well for backing up forums] [also crsociety still wont load on my galaxy s24 or on my arc browser on my macbook] Edited October 10 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Pablo Stafforini Posted October 9 Report Share Posted October 9 This is great news! Quote Link to comment Share on other sites More sharing options...
IgorF Posted October 9 Report Share Posted October 9 Thanks a lot for making it back! Br, Igor Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted October 9 Report Share Posted October 9 Any idea how it came back on line? I haven't heard anything from the admin @James Cain about him doing anything. Quote Link to comment Share on other sites More sharing options...
mccoy Posted October 9 Report Share Posted October 9 Hi all, hope everyone is well. It's good that the forum has been resuscitated, there is some good reference material in here. Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted October 9 Report Share Posted October 9 Hey @mccoy! Nice to hear from you again. I agree. It would be a shame to lose all the archives. I wonder if @Saul knows it's back. Quote Link to comment Share on other sites More sharing options...
Gordo Posted October 14 Report Share Posted October 14 Cool, I was trying to check in every couple weeks just to see if it would come back online but had started to think it was gone forever. Thanks to whomever paid the bill to turn it back on! I like the community here. Agree it would be nice to get a searchable archive somehow out there somewhere on the web. Also wondering if there is a free hosting option so this doesn't happen again? Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted October 14 Report Share Posted October 14 Hey Gordo! A little background on what happened. The CR Society treasurer David Stern died a while back (RIP). There was no transition plan and he was the only one with access to the CRS bank account and credit card. The bill for hosting the forum eventually went unpaid when the credit card expired. James tried several times, but as far as I could tell none of the remaining CRS Board members seemed able/willing to gain access to the money so we could start paying the bill again (I was CCed on the emails). I'm still not sure how the issue was resolved and who is paying the bill now. There is talk on the email chain about having Longecity.org take over hosting for a one-time payment from the CR society, but I'm not sure where that stands. I'll try to find out more. --Dean Quote Link to comment Share on other sites More sharing options...
Ron Put Posted October 15 Report Share Posted October 15 (edited) Thanks for the update, Dean! I was wondering where did the site go... Sad to hear about a member's death, RIP. Good to see everyone else in here, amazingly, you all still look the same! Edited October 15 by Ron Put Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 15 Author Report Share Posted October 15 How robust are longecity.org finances/admins? There is a lot of activity on longecity, but the S/N ratio is much poorer and the quality isn't as good as before - still - I'm glad it still exists and survives. Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted October 15 Report Share Posted October 15 I haven't heard back from my inquiry to folks on the CR Board asking about how the forum came back online and what exactly is the plan with Longecity. I asked them if someone would post here share the details. We'll see if they do... Quote Link to comment Share on other sites More sharing options...
Gordo Posted October 16 Report Share Posted October 16 On 10/15/2024 at 3:43 PM, Dean Pomerleau said: I haven't heard back from my inquiry to folks on the CR Board asking about how the forum came back online and what exactly is the plan with Longecity. I asked them if someone would post here share the details. We'll see if they do... I'd love to know what the monthly costs are (burn rate) and how much $ is left in the CRSociety account, so we can know in advance, approximately when the forum will disappear again absent a new cash infusion or fundraising. Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted October 16 Report Share Posted October 16 19 minutes ago, Gordo said: I'd love to know what the monthly costs are (burn rate) and how much $ is left in the CRSociety account, so we can know in advance, approximately when the forum will disappear again absent a new cash infusion or fundraising. As I recall, the CRS had plenty of money in it's bank account to pay hosting costs for many years, assuming someone has been able to tap into it again. BTW, I still haven't heard back from anyone (Brian, Michael, James, Bob) about the plan for going forward. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 16 Author Report Share Posted October 16 (edited) Brian has been going through some things but should be available again soon Edited October 16 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
5fp4 Posted Sunday at 04:12 PM Report Share Posted Sunday at 04:12 PM Ahh... We're back. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 12:27 PM Author Report Share Posted Monday at 12:27 PM (edited) Meanwhile, can someone properly scrape crsociety.org so we don't lose everything just in case? WinHttrack is one solution I've found that works but I've become rusty and need time to properly learn it again. But you can easily code up a scraper with Claude Sonnet 3.5 there's a tool called firecrawl.. https://x.com/ericciarla/status/1850958136531968465 Edited Monday at 06:24 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted Monday at 12:45 PM Report Share Posted Monday at 12:45 PM If you want it done and have the technical chops, I suggest you go ahead and do it. I think it highly unlikely anyone else will. Heck I still haven't been able to find out who brought the forum back online and who is paying to keep it that way. I wouldn't be surprised if it went away for good without any notice. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 09:22 PM Author Report Share Posted Monday at 09:22 PM Ok, httrack's outputs are really weird and distorted, but scraping software of the future will make this irrelevant === Anyways, we start off like this: === AC generate URLs from https://www.crsociety.org/forum/6-general-health-and-longevity/page/1/ to https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ I'll help you generate URLs for all 48 pages of that forum section. I'll create a code artifact to generate these URLs systematically. Forum URL Generator Click to open code The code above will generate all 48 URLs in the format you requested. Each URL follows the pattern specified, starting from page 1 and going through page 48. The URLs are: https://www.crsociety.org/forum/6-general-health-and-longevity/page/1/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/2/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/3/ ...and so on through... https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ Would you like me to output all 48 URLs explicitly, or would this summary suffice for your needs? Copy Claude does not have internet access. Links provided may not be accurate or up to date. AC output all 48 urls explicitly Here are all 48 URLs explicitly: https://www.crsociety.org/forum/6-general-health-and-longevity/page/1/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/2/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/3/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/4/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/5/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/6/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/7/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/8/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/9/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/10/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/11/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/12/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/13/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/14/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/15/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/16/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/17/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/18/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/19/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/20/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/21/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/22/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/23/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/24/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/25/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/26/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/27/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/28/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/29/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/30/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/31/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/32/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/33/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/34/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/35/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/36/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/37/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/38/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/39/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/40/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/41/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/42/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/43/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/44/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/45/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/46/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/47/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ Copy Claude does not have internet access. Links provided may not be accurate or up to date. AC generate list of URLs from https://www.crsociety.org/forum/5-cr-practice/page/1/ to https://www.crsociety.org/forum/5-cr-practice/page/28/ Here are all 28 URLs for the CR Practice forum section: https://www.crsociety.org/forum/5-cr-practice/page/1/ https://www.crsociety.org/forum/5-cr-practice/page/2/ https://www.crsociety.org/forum/5-cr-practice/page/3/ https://www.crsociety.org/forum/5-cr-practice/page/4/ https://www.crsociety.org/forum/5-cr-practice/page/5/ https://www.crsociety.org/forum/5-cr-practice/page/6/ https://www.crsociety.org/forum/5-cr-practice/page/7/ https://www.crsociety.org/forum/5-cr-practice/page/8/ https://www.crsociety.org/forum/5-cr-practice/page/9/ https://www.crsociety.org/forum/5-cr-practice/page/10/ https://www.crsociety.org/forum/5-cr-practice/page/11/ https://www.crsociety.org/forum/5-cr-practice/page/12/ https://www.crsociety.org/forum/5-cr-practice/page/13/ https://www.crsociety.org/forum/5-cr-practice/page/14/ https://www.crsociety.org/forum/5-cr-practice/page/15/ https://www.crsociety.org/forum/5-cr-practice/page/16/ https://www.crsociety.org/forum/5-cr-practice/page/17/ https://www.crsociety.org/forum/5-cr-practice/page/18/ https://www.crsociety.org/forum/5-cr-practice/page/19/ https://www.crsociety.org/forum/5-cr-practice/page/20/ https://www.crsociety.org/forum/5-cr-practice/page/21/ https://www.crsociety.org/forum/5-cr-practice/page/22/ https://www.crsociety.org/forum/5-cr-practice/page/23/ https://www.crsociety.org/forum/5-cr-practice/page/24/ https://www.crsociety.org/forum/5-cr-practice/page/25/ https://www.crsociety.org/forum/5-cr-practice/page/26/ https://www.crsociety.org/forum/5-cr-practice/page/27/ https://www.crsociety.org/forum/5-cr-practice/page/28/ Copy Claude does not have internet access. Links provided may not be accurate or up to date. AC generate list of URLs from https://www.crsociety.org/forum/9-chitchat/page/21/ to https://www.crsociety.org/forum/9-chitchat/page/1/ Here are the URLs in descending order from page 21 to page 1: https://www.crsociety.org/forum/9-chitchat/page/21/ https://www.crsociety.org/forum/9-chitchat/page/20/ https://www.crsociety.org/forum/9-chitchat/page/19/ https://www.crsociety.org/forum/9-chitchat/page/18/ https://www.crsociety.org/forum/9-chitchat/page/17/ https://www.crsociety.org/forum/9-chitchat/page/16/ https://www.crsociety.org/forum/9-chitchat/page/15/ https://www.crsociety.org/forum/9-chitchat/page/14/ https://www.crsociety.org/forum/9-chitchat/page/13/ https://www.crsociety.org/forum/9-chitchat/page/12/ https://www.crsociety.org/forum/9-chitchat/page/11/ https://www.crsociety.org/forum/9-chitchat/page/10/ https://www.crsociety.org/forum/9-chitchat/page/9/ https://www.crsociety.org/forum/9-chitchat/page/8/ https://www.crsociety.org/forum/9-chitchat/page/7/ https://www.crsociety.org/forum/9-chitchat/page/6/ https://www.crsociety.org/forum/9-chitchat/page/5/ https://www.crsociety.org/forum/9-chitchat/page/4/ https://www.crsociety.org/forum/9-chitchat/page/3/ https://www.crsociety.org/forum/9-chitchat/page/2/ https://www.crsociety.org/forum/9-chitchat/page/1/ Copy Claude does not have internet access. Links provided may not be accurate or up to date. AC https://www.crsociety.org/forum/4-cr-science-theory/page/15/ from 1 to 15 Here are the URLs for the CR Science & Theory forum from page 1 to 15: https://www.crsociety.org/forum/4-cr-science-theory/page/1/ https://www.crsociety.org/forum/4-cr-science-theory/page/2/ https://www.crsociety.org/forum/4-cr-science-theory/page/3/ https://www.crsociety.org/forum/4-cr-science-theory/page/4/ https://www.crsociety.org/forum/4-cr-science-theory/page/5/ https://www.crsociety.org/forum/4-cr-science-theory/page/6/ https://www.crsociety.org/forum/4-cr-science-theory/page/7/ https://www.crsociety.org/forum/4-cr-science-theory/page/8/ https://www.crsociety.org/forum/4-cr-science-theory/page/9/ https://www.crsociety.org/forum/4-cr-science-theory/page/10/ https://www.crsociety.org/forum/4-cr-science-theory/page/11/ https://www.crsociety.org/forum/4-cr-science-theory/page/12/ https://www.crsociety.org/forum/4-cr-science-theory/page/13/ https://www.crsociety.org/forum/4-cr-science-theory/page/14/ https://www.crsociety.org/forum/4-cr-science-theory/page/15/ === With max-depth=2, this is enough to grab all threads with 6 or fewer pages. However, this is not enough to grab all threads with >6 pages, and especially those with >10 pages, where max-depth=3 is not sufficient enough. == I put in this prompt to try, but it's not sophisticated enough to generate all thread URLs that have more than 6 pages of replies [there is a link rule one could use to fetch all threads with more than 6 pages if one feeds pagecount of a thread into a variable, but b/c the thread titles aren't arranged in numerical order, the task becomes harder.. if any page in https://www.crsociety.org/topic/ (like [$THREADTITLE=https://www.crsociety.org/topic/17700-glucose-versus-cholesterol-optimization/page/6/]) has page/$HIGHESTNUMBER with $HIGHESTNUMBER >=6, generate all pages from https://www.crsociety.org/topic/ $THREADTITLE/page/1/ to https://www.crsociety.org/topic/ $THREADTITLE/page/$HIGHESTNUMBER/ for example, in this thread https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137/#comments, #THEADTITLE=17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic and $HIGHESTNUMBER=137, so generate the URLs of all 137 pages of this thread Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 09:32 PM Author Report Share Posted Monday at 09:32 PM (edited) If I want to generate ALL URLs from an index file, use this prompt in claude if any page in https://www.crsociety.org/topic/ (like [$THREADTITLE=https://www.crsociety.org/topic/17700-glucose-versus-cholesterol-optimization/page/6/]) has page/$HIGHESTNUMBER with $HIGHESTNUMBER >=6, generate all pages from https://www.crsociety.org/topic/ $THREADTITLE/page/1/ to https://www.crsociety.org/topic/ $THREADTITLE/page/$HIGHESTNUMBER/ for example, in this thread https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137/#comments, #THEADTITLE=17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic and $HIGHESTNUMBER=137, so generate the URLs of all 137 pages of this thread ==== from "https://www.crsociety.org/profile/7147-gordo/ https://www.crsociety.org/topic/18714-its-time-to-science-the-sh-out-of-dunedinpace/?do=getLastComment https://www.crsociety.org/topic/18450-epigenetic-test-9-finally-a-younger-horvath-age-than-the-chronological/ https://www.crsociety.org/profile/8107-mike-lustgarten/ https://www.crsociety.org/profile/8107-mike-lustgarten/ https://www.crsociety.org/topic/18450-epigenetic-test-9-finally-a-younger-horvath-age-than-the-chronological/?do=getLastComment https://www.crsociety.org/topic/18271-metabolomics-thread/ https://www.crsociety.org/profile/5068-alex-k-chen/ https://www.crsociety.org/profile/5068-alex-k-chen/ https://www.crsociety.org/topic/18271-metabolomics-thread/?do=getLastComment https://www.crsociety.org/topic/18707-amphetamine-neurotoxicity-or-the-opposite/ https://www.crsociety.org/profile/5068-alex-k-chen/ https://www.crsociety.org/profile/5068-alex-k-chen/ https://www.crsociety.org/topic/18707-amphetamine-neurotoxicity-or-the-opposite/?do=getLastComment https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/#comments https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/2/#comments https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/3/#comments https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/4/#comments https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137/#comments https://www.crsociety.org/profile/7147-gordo/ https://www.crsociety.org/profile/7147-gordo/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/?do=getLastComment https://www.crsociety.org/topic/18267-kynureninetryptophan-ratio-more-insight-into-the-de-novo-nad-synthesis-pathway/ https://www.crsociety.org/profile/8107-mike-lustgarten/ https://www.crsociety.org/profile/8107-mike-lustgarten/ https://www.crsociety.org/topic/18267-kynureninetryptophan-ratio-more-insight-into-the-de-novo-nad-synthesis-pathway/?do=getLastComment https://www.crsociety.org/topic/18244-the-walnuts-thread-theyre-the-weird-nut/ https://www.crsociety.org/profile/5068-alex-k-chen/" where a sample thread is https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137 [with label https://www.crsociety.org/topic/$threadtitle/page/$MAXPAGES] grab all urls that have page/$MAXPAGES, copy that number into an array MAXDEPTH[$threadtitle]=$MAXPAGES I'll analyze those URLs and identify threads with page numbers, creating a mapping of thread titles to their maximum page depths. For URLs that follow the pattern /topic/{threadtitle}/page/{number}, I'll store the maximum page number. From the provided URLs, here are the threads with their maximum page depths: MAXDEPTH["17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic"] = 137 This is the only thread in your provided list that clearly shows its maximum page depth. While there are other thread URLs in your list, they don't include page numbers in the format /page/{number}, so we can't determine their maximum depths from this data. Would you like me to look for additional thread URLs with page numbers, or would you like me to help analyze another set of URLs? CopyRetry https://www.crsociety.org/forum/6-general-health-and-longevity/?sortby=posts&sortdirection=desc# https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/ h pasted AC how about this list then for threads with $MAXPAGES >=6, generate a list of ALL URLs from /topic/{threadtitle}/page/6:$MAXPAGES From this list, I can identify threads with their maximum page depths: MAXDEPTH["17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic"] = 137 MAXDEPTH["11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses"] = 43 MAXDEPTH["11719-olive-oil-healthy-or-not"] = 17 MAXDEPTH["17649-vaccine-risks-injury-and-related-topics"] = 10 MAXDEPTH["17700-glucose-versus-cholesterol-optimization"] = 6 For threads with pages ≥ 6, here are all URLs from page 6 to their maximum: For thread "17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic" (pages 6-137): https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/6/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/7/ [...] https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137/ For thread "11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses" (pages 6-43): https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/6/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/7/ [...] https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/43/ For thread "11719-olive-oil-healthy-or-not" (pages 6-17): https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/6/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/7/ [...] https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/17/ For thread "17649-vaccine-risks-injury-and-related-topics" (pages 6-10): https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/6/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/7/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/8/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/9/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/10/ For thread "17700-glucose-versus-cholesterol-optimization" (page 6 only): https://www.crsociety.org/topic/17700-glucose-versus-cholesterol-optimization/page/6/ Would you like me to expand the [...] sections to show all page URLs explicitly for any particular thread? CopyRetry Claude does not have internet access. Links provided may not be accurate or up to date. AC output all these threads as a text file Edited Monday at 09:33 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 09:41 PM Author Report Share Posted Monday at 09:41 PM (edited) There are only a small handful of threads where #MAXPAGES >=6 so I can manually copy all of these, create a list of URLs just for that small handful of threads and then put those in a text file to input into httrack. Then I can use standard httrack with depth=2 to archive all the threads that have #MAXPAGES=5 the standard way and then add in a .txt file containing all the extra URLs it's a bit forced/unelegant but claude makes the list very easy and quick to create so I'll do it. I had a huge ugh factor with winhttrack for a long time because of the #MAXPAGES issue across many forums. https://x.com/firecrawl_dev/status/1850934833498775816 Edited Monday at 09:48 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted Monday at 09:48 PM Report Share Posted Monday at 09:48 PM Sounds good Alex. What are you going to do with the scraped data once you've got it? Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 09:49 PM Author Report Share Posted Monday at 09:49 PM (edited) Keep it on a HD for as long as I can [I also need to get better at maintaining/properly backing up my own HDs], but maybe send to archiveteam or someone who cares about LLMs+health at some point. Edited Monday at 09:49 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted Monday at 09:54 PM Report Share Posted Monday at 09:54 PM Will your scrape capture images in posts too? Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 10:00 PM Author Report Share Posted Monday at 10:00 PM (edited) It's not great at capturing formatting, it breaks the formatting. It depends on whether the images are internal or external URLs *and* how embedded they are. Give me a thread with lots of pictures and I'll see how to keep images in it.. ==== https://claude.site/artifacts/2f62c162-e969-4d03-8074-9d940b341a59 === for generating page sequences to 6+ page long threads... claude artifacts generates responses that are too long so you have to transfer the code to replit. === I'll create a Python script that can parse URLs and generate page sequences for threads. This will work in replit: URL Parser and Generator Click to open code Published To use this script in replit: Create a new repl and select Python as the language Copy the code above into main.py Create a file called urls.txt and paste your list of URLs into it (one URL per line) Run the script The script will: Read URLs from urls.txt Extract thread titles and their maximum page numbers For threads with 6 or more pages, generate all URLs from page 6 to the maximum page Print the results Features: Uses regex to extract thread IDs and page numbers Handles malformed URLs gracefully Only processes threads with 6 or more pages Generates clean URLs without the #comments suffix Uses type hints for better code readability Includes error handling for file operations You can modify the code to output to a file instead of printing to console by changing the print statements to file writes if needed. Would you like me to add that functionality as well? == Now to get multipage threads: https://www.crsociety.org/forum/4-cr-science-theory/?sortby=posts&sortdirection=desc [only 3] https://www.crsociety.org/forum/9-chitchat/?sortby=posts&sortdirection=desc [only 3] https://www.crsociety.org/forum/6-general-health-and-longevity/?sortby=posts&sortdirection=desc [this is the only forum that has significant threads with 6+ pages, only 5 threads, I got claude to generate the list it keeps on exceeding the max for artifacts] https://www.crsociety.org/forum/5-cr-practice/?sortby=posts&sortdirection=desc [only 3] == We're so lucky the forums went back up after Claude 3.5.1 Sonnet came out, otherwise it would have forced me to take way more time to set intention right, and many middle threads might have gotten lost == == the other issue is threads with attachments in them.. Edited Monday at 11:40 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted Monday at 10:32 PM Report Share Posted Monday at 10:32 PM This thread has a bunch of images mostly linked. https://www.crsociety.org/topic/11488-cold-exposure-other-mild-stressors-for-increased-health-longevity/#comment-14653 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.