Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Content Aggregator

Grow Your Python Portfolio With 13 Intermediate Project Ideas Darren Jones 03:57

Here are examples of content aggregators you can use for inspiration:

Here are resources that you can use to build your content aggregator:

requests: HTTP library for Python, built for human beings
Beautiful Soup: Python library for quick turnaround projects like screen-scraping
sqlite3: A self-contained, serverless, transactional SQL database engine
celery: Distributed task queue
apscheduler: In-process task scheduler with Cron-like capabilities

00:00 Web Project Ideas. In this section, you’re going to see some projects which lend themselves readily to being created for the web, but that doesn’t mean to say they have to be implemented solely for that platform, and you may find they’re suitable for GUI or even CLI implementations.

00:16 First up, a content aggregator. Then, you’re going to see a regex query tool, a URL shortener, post-it notes, and finally, a quiz application.

00:28 Let’s have a look at a content aggregator. Content is king—it exists everywhere on the web, from blogs to social media platforms. To keep up, you need to search for new information on the internet constantly.

00:40 One way to do this is to check all of the sites manually to see what new posts are present, but this is time-consuming, inefficient, and can be pretty tiring.

00:49 This is where a content aggregator comes in. A content aggregator fetches information from various places online and gathers all of that information in a single site.

00:58 Therefore, you don’t have to visit multiple sites to get the latest information—one site will be enough. With a content aggregator, all of the information can be gotten from one site that aggregates everything you’re interested in.

01:42 Now, let’s look at some of the technical details that you’ll need to implement to allow you to create a content aggregator. Firstly, you’ll need to access content with libraries such as requests and also a new one, BeautifulSoup.

01:55 So, we’ve already seen requests, which is an excellent way to access web data, but BeautifulSoup is a Python library for pulling data out of that returned HTML, and it allows quick access to the semantic contents of web pages, allowing straightforward scraping of web data from websites.

02:39 This is for storage and recall of the data that you’ve obtained. This may well be in the ORM that’s part of the framework that you’re using—which is one of the strengths of a framework such as django—or, if you’re using a micro-framework like flask, you may need to implement your own solution.

02:54 This can have pros and cons. Next up, scheduling. As seen previously, using a library such as celery or apscheduler will allow the data to be regularly updated.

03:06 This will mean you’ll be able to keep a track on what has been in your content aggregator, even if you haven’t been visiting, and possibly look at historical data.

03:15 Now, let’s look at some extra challenges when programming your content aggregator, the first of which will be adding new websites. Adding new websites to a aggregator will mean accessing content which is formatted in a different way, meaning you need to use a different structure to access it—using BeautifulSoup, or possibly using an API. Secondly, user implementation.

03:37 Adding different users to a site could allow their viewing to be different. Each could mark a story as read or ask for their own updated stories whenever they visit the site. Also, a selection of sites for users: each user could select the sites they want the data to come from, such as from a list of implemented sites.

Become a Member to join the conversation.