Here are examples of content aggregators you can use for inspiration:
Here are resources that you can use to build your content aggregator:
- requests: HTTP library for Python, built for human beings
- Beautiful Soup: Python library for quick turnaround projects like screen-scraping
- sqlite3: A self-contained, serverless, transactional SQL database engine
- celery: Distributed task queue
- apscheduler: In-process task scheduler with Cron-like capabilities
00:00 Web Project Ideas. In this section, you’re going to see some projects which lend themselves readily to being created for the web, but that doesn’t mean to say they have to be implemented solely for that platform, and you may find they’re suitable for GUI or even CLI implementations.
00:28 Let’s have a look at a content aggregator. Content is king—it exists everywhere on the web, from blogs to social media platforms. To keep up, you need to search for new information on the internet constantly.
00:58 Therefore, you don’t have to visit multiple sites to get the latest information—one site will be enough. With a content aggregator, all of the information can be gotten from one site that aggregates everything you’re interested in.
01:10 You can see all of the posts that interest you and decide whether to find out more about them without having to traipse all over the internet. Let’s look at a couple of implementations of content aggregators. Here, you can see Hvper, which aggregates a number of news sites, such as Reddit, Google News, and BuzzFeed. And here you can see AllTop, which aggregates a number of popular sites—TechCrunch, Wired, the New York Times Front Page, et cetera—but also allows you to create a customized page with any RSS feed.
Now, let’s look at some of the technical details that you’ll need to implement to allow you to create a content aggregator. Firstly, you’ll need to access content with libraries such as
requests and also a new one,
So, we’ve already seen
requests, which is an excellent way to access web data, but
BeautifulSoup is a Python library for pulling data out of that returned HTML, and it allows quick access to the semantic contents of web pages, allowing straightforward scraping of web data from websites.
Using a library like
BeautifulSoup to do this kind of work will save you hours—if not weeks—of programming and allows you to access the content quickly with a minimum of fuss. However, you need to ensure that you’re not breaking a site’s terms of services when scraping information from it in this way. Next up, you’re going to need a database. This could be something simple such as
sqlite3 or using the ORM which comes as part of your framework that you’re using.
This is for storage and recall of the data that you’ve obtained. This may well be in the ORM that’s part of the framework that you’re using—which is one of the strengths of a framework such as
django—or, if you’re using a micro-framework like
flask, you may need to implement your own solution.
Now, let’s look at some extra challenges when programming your content aggregator, the first of which will be adding new websites. Adding new websites to a aggregator will mean accessing content which is formatted in a different way, meaning you need to use a different structure to access it—using
BeautifulSoup, or possibly using an API. Secondly, user implementation.
03:37 Adding different users to a site could allow their viewing to be different. Each could mark a story as read or ask for their own updated stories whenever they visit the site. Also, a selection of sites for users: each user could select the sites they want the data to come from, such as from a list of implemented sites.
Become a Member to join the conversation.