Trying to import a RSS feed into Core Data. Once they are imported, when trying to update the feed again afterwards, how do I most efficiently prevent duplicates. Right now it checks every item against the datastore during the parsing, which is not very efficient.
I looked into the Top Songs sample from Apple. It uses a least recently used cache for categories. But when every item is different the cache doesn't help at all.
EDIT: To clarify, I can already identify each item uniquely in the feed with guid. The issue is the performance of comparing hundreds of items against the database every time, when most of them are duplicates.
When you are importing a new row you can run a query against the existing rows to see if it is already in place. To do this you create a
NSFetchRequest against your entity, set the predicate to look for the guid property and set the max rows returned to 1.
I would recommend keeping this
NSFetchRequest around during your import so that you can reuse it while going through the import. If the
NSFetchRequest returns a row you can update that row. If it does not return a row then you can insert a new row.
When done correctly you will find the performance more than acceptable.
Can you modify your core data model ?
If you can I would add a "Hash" property to each feed entry to uniquely identify it. Then you could efficiently detect wether a specific entry is already in your database or not.