themaLeecher
http://leecher.themasoftware.com/forum/

[goodreads.com] How I can leech this site.
http://leecher.themasoftware.com/forum/viewtopic.php?f=19&t=6421
Page 1 of 1

Author:  simonrule [ June 18th, 2018, 9:10 pm ]
Post subject:  [goodreads.com] How I can leech this site.

URL: https://www.goodreads.com/quotes?page=1

Error message: cant leech it perfectly.

Problem:do proper cleaning.

on this pic there is an example http://prntscr.com/jwgvf3

is there anyway that leech this website like this


“Don't cry because it's over, smile because it happened.”
― Dr. Seuss

tags: attributed-no-source, cry, crying, experience, happiness, joy, life, misattributed-dr-seuss, optimism, sadness, smile, smiling

if its possible to get the image too that will be great if not thats okay and i want to remove anything related to this site because i want to add it on my website

Author:  Freddy [ June 19th, 2018, 8:25 am ]
Post subject:  Re: [goodreads.com] How I can leech this site.

You just need to add custom selectors: http://leecher.freddy.lt/faq.php?expand=faq206

Go to "WEBSITES" -> select that website -> "Selectors" tab -> add these selectors:

Subject:
Code:
h1


Message:
Code:
div.quoteDetails


Subject with URL:
Code:
a:contains(likes)


Remove elements:
Code:
div.quoteFooter


Tags:
Code:
div.quoteFooter


If using "Multi-Pages" leeching enable to "Don't use subject from 'Subject with URL' selector'" option.

If adding to "Pages" -> in "Other settings" -> enable "Don't use subject from 'Subject with URL' selector'" option (only for this website's pages).

Author:  simonrule [ June 19th, 2018, 10:39 am ]
Post subject:  Re: [goodreads.com] How I can leech this site.

works well thank you so much

Author:  simonrule [ June 19th, 2018, 12:26 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

i did got few skipped messages due that they already been leeched is there anyway i clear everything because i was just testing sites i dont want the program to skip anything on this site

Author:  Freddy [ June 19th, 2018, 2:47 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

simonrule wrote:
i did got few skipped messages due that they already been leeched is there anyway i clear everything because i was just testing sites i dont want the program to skip anything on this site


In "MESSAGES" -> select needed (or all) section(s) -> delete all messages.

The duplicate messages are skipped only if it's already in "MESSAGES".

Author:  simonrule [ June 19th, 2018, 3:34 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

well i did check "messages" and i removed all sections all images and im sure that there is not messages there but when it tried again i still got duplicates im waiting till the program finished and check manually because so far i managed to leech only 150 message the other are skipped i think there is like 3000 message. all i can think off right now is those duplicates are on the website

results:
http://prntscr.com/jwt0my

this really weird to have so many dups and leech only 150 its like each quote have like for example 10 times dups

Author:  Freddy [ June 19th, 2018, 6:30 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

How are you leeching? Using which feature?

What URL do you use exactly? If "Multi-Pages" from which to which page? Could you actually make a screenshot of "Multi-Pages" tab with URL and your settings set to see what you might have wrong.

Author:  simonrule [ June 19th, 2018, 7:01 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

Freddy wrote:
How are you leeching? Using which feature?

What URL do you use exactly? If "Multi-Pages" from which to which page? Could you actually make a screenshot of "Multi-Pages" tab with URL and your settings set to see what you might have wrong.


im using multi page feature, here is the settings:
http://prntscr.com/jwuz99

https://www.goodreads.com/quotes
https://www.goodreads.com/quotes?page=#pageNumber#

what happened is that when i start leeching everything was good until i reach 150 message after that all i get is skipped messages

Author:  Freddy [ June 19th, 2018, 7:46 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

Increment should be left at 1 (not 30). It's not meant for that.

Now it's leeching pages: 1, 30, 60, 90, 120, 150, 180, 210... etc. (but there are only 100 pages, and after page 100 they are just showing the same quotes over and over again).

Always leave increment at 1, it's needed only for some very rare cases and it's mostly for some rare type of forums.

Author:  simonrule [ June 19th, 2018, 9:49 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

many thanks bro just one more thing please
the leeched message i get is this

Quote:
Image
“Don't cry because it's over, smile because it happened.”

Dr. Seuss


can i make it like this

Quote:
Image
“Don't cry because it's over, smile because it happened.”

―Dr. Seuss


you may see that there is no difference but i want to remove anything related to the site i leeched from, if you still cannot understand just click on on this message and u will see the difference.

Author:  Freddy [ June 20th, 2018, 7:36 am ]
Post subject:  Re: [goodreads.com] How I can leech this site.

You can do that with replacements.

In "REPLACEMENTS" tab add this replacement:
Search for:
Code:
\[url=https?://www\.goodreads.+?\](.+?)\[/url\]


Replace with:
Code:
$1


Enable regex.

After that in "WEBSITES" -> select this website -> "Replacements" tab -> select that replacement and press "Use selected replacements for website".

It will only affect new leeched messages (you will need to re-leech).

You can re-leech just by selecting all messages in "MESSAGES" tab and above the list there is "Re-leech" button [green arrow] (hover mouse for tool-tips if needed).

Author:  simonrule [ June 20th, 2018, 5:29 pm ]
Post subject:  Re: [goodreads.com] How I can leech this site.

thank you so much for the help now everything is good

Page 1 of 1 All times are UTC
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/