Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load from csv #90

Open
janetriley opened this issue Jun 6, 2017 · 3 comments
Open

Add load from csv #90

janetriley opened this issue Jun 6, 2017 · 3 comments
Assignees

Comments

@janetriley
Copy link
Collaborator

console/commands/load can handle OPML files.

I don't have OPML, and couldn't easily find an OPML editor. CSV is easy to compose, however.

Add support for loading feeds from CSV.

@janetriley janetriley self-assigned this Jun 6, 2017
@bbengfort
Copy link
Member

Great idea! In terms of OPML editor, we actually used feedly which has an export to OPML feature. However, CSV is a great feature to add!

@janetriley
Copy link
Collaborator Author

It looks like the required fields to create a Feed are link and category, with an optional title.

Is that right?

Here's my understanding of a Feed:

from baleen.models:

class Feed(me.DynamicDocument):
   # my (optional) title for this feed
    title = me.StringField(max_length=256)  

    # the link to get the RSS feed. FeedParser may update it during sync if it sees a different href. 
    link = me.URLField(required=True, unique=True)  

    #  A dict of xmlURL, which is the link above, and an htmlURL, which is ...?  the human-friendly version of the site? 
    urls = me.DictField()

   # my name for the collection of documents  - like a corpus name. One category per feed.
    category = me.StringField(required=True)
  
   # for Baleen - guessing the Job ignores inactive feeds
    active = me.BooleanField(default=True)

    # fields that the FeedParser package modifies
    version = me.StringField(choices=FEEDTYPES)
    etag = me.StringField()
    modified = me.StringField()
    fetched = me.DateTimeField(default=None)
    signature = me.StringField(max_length=64, min_length=64, unique=False)

    created = me.DateTimeField(default=datetime.now, required=True)
    updated = me.DateTimeField(default=datetime.now, required=True)

Am I heading in the right direction? This is simpler than I was expecting.

@bbengfort
Copy link
Member

Yep, that's pretty much correct - the OPML file doesn't contain much information - title and link are by far the most important, with category and active being of secondary importance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants