-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help to find a fingerprint for 60+ ippen.media newssites #926
Comments
an include-command could also help. Then I could do the 60+ files, just once and put an Another advantage here would be that you could make additional settings per site if necessary. |
I think I will try to fingerprint this line @fivefilters if my test on the weekend succeed, would you put this on your standard config.php? Especially on filvefilters.org in your live system. The release for self-hosters will surely take longer, or? I would then write a note in the .ippen.media.txt for them. |
@HolgerAusB Thanks, we'll take a look soon and try to add it to the config. You're right that at the moment the config.php is only updated with each new release. It would be nice to have a way of updating these via the site config files somehow. We've had a suggestion of linking site config files together using symlinks before, but I'm worried it'll create difficulty inside git, zip, and the different platforms people use to run the software. |
@fivefilters, I'm sorry to disturb you. Do you have any rough prognosis when you can install the fingerprint-part to the config? I don't want to rush. I just have to do the first fix inside .ippen.media. If you don't have time right now, I would create copies for the larger newspapers for their domains and put them in the PR. (see also #928) |
sorry @fivefilters, @j0k3r. Me again. It seems that my fingerprint is not found under some circumstands, even if its there. So instead I will write a script to copy an ippen-template for every site while picking the test-link from a second file on my side. Then uploading all files and PR the hole bunch. |
@j0k3r, I don't really know how to act now. YOUR suggestion currently works only in Graby/Wallabag. FTR currently only searches within And even if @fivefilters finds time to include one of the fingerprints, you never know how long it will be valid. If the provider changes something, it may take again weeks until it is included in a new version of FTR. Even if it means more work for me, it is more flexible if I just run my script and create 50+ single sitename.de.txt files from a template. Translated with www.DeepL.com/Translator (free version) |
So I now have finished generating config-files for 56 news sites. Just not shure to PR or wait for answer. |
Wait for @fivefilters answer |
Thanks @HolgerAusB @j0k3r! I'm happy for these to be merged until we can improve the fingerprint handling in Full-Text RSS. I think in the future these fingerprints should really exist outside of the code so they can be updated without the need for new code releases. |
just a new idea. How about referring to another config-file? So all these 56 site-files could have something like this |
I quite like this suggestion. There was a proposal a while back using symlinks to achieve something similar, but I was concerned the symlinks wouldn't survive different systems and packaging (e.g. zip, where symlinks aren't part of the format itself). I think this has the advantage that we can also add a test_url specific for the entry, which should hopefully allow us to catch problems in the future. What do you think @j0k3r? |
That's an interesting suggestion. I think I still prefer the fingerprint because it avoid having many files. But the idea of having one real One question comes in my mind: should the included file config have a custom name, like |
I like having the fingerprint option too, in such a way that they can be maintained independent of the code. So I'm open to implementing both options. To me, the fingerprint option was intended to be more far-reaching, for example for platforms like WordPress.com or Medium.com that might use a HTML template we can target, but let people use their own domain names, resulting in thousands or millions of sites we can't possibly know about or wish to store individually in this repository. In this particular case, it's quite a lot of sites, so I can see the fingerprint option being better suited for it, especially if more get added and we don't want to track them ourselves. And if there are some sites within this group that would benefit from have a test_url, I think we could have site config files with only test_url lines, which would allow us to monitor for changes. |
follow up from #1184 for FTR: for Graby/Wallabag: I checked with FTR and Wallabag, should work now. |
@fivefilters the substack fingerprint isn't live in current release of FTR, right? |
@HolgerAusB It is, but it's a little longer. We shortened it recently as the longer one didn't appear on the custom domains tested. That shorter one which I shared earlier will be in the next version. |
added fingerprint for substack to my graby-PR, thank you |
ippen.media is a big German news company with a large number of (mostly local) German newspapers/magazines and there corresponding websites. All/most of these websites using the same (ugly) CMS. Most clutter is allready filtert by standard fivefilerts; now I managed to write a config, to strip the small rest of some anoying parts.
But no one wants to maintain 60+ config files, one for each of the domains. My skills aren't enough to find a reliable single fingerprint for the array in custom_config.php, so I just put my local new in there, e.g.
That is good for me, but not for hosted versions. So could please someone find a single, reliable fingerprint that matches most ippen.media Websites?
Here are some of the bigger papers:
https://www.fr.de/
https://www.fnp.de/
https://www.hna.de/
https://www.hanauer.de/
https://www.kreiszeitung.de/
get original feed by adding the following to the URL of the category-page:
rssfeed.rdf
e.g. https://www.fr.de/hessen/rssfeed.rdf
or is there any other way to push custom fingerprints in a seperate file? But I think, if @fivefilters would blow up his array with 60+ Domains this could slow down performance for ALL useres, which is not what should be happen!
I tried to use the div arround the IM-Logo in the upper right of the page, but I had to find out that fingerprinting only works in the header and not the body, or did I make a mistake there?
The text was updated successfully, but these errors were encountered: