Reading Multiple File Inputs
00:00
In this lesson, I’ll show you how to read from multiple file inputs concurrently with the fileinput
module, which is a cool little trick that you can use to essentially make reading multiple file inputs into like reading just one file input.
00:15
The fileinput
module is quite simple. It mostly has just the function input()
, which takes in an optional list of filenames and it defaults to the system arguments.
00:27
So, the arguments provided to the execution of this Python function—Python module, I should say, so sys.argv
. And it takes all of the filenames from the filenames
list, and then returns, essentially, just one input stream that you can then just iterate through like any other input stream.
00:45 It also provides some useful information about each line in the output, like the line number, whether the line is the first line in its file, and so on. It gives you some stuff to work with instead of just pretending that it’s all one big file, so that’s pretty helpful.
01:00 I won’t be using a sample directory for this, I’ll just be creating the files that I need in a blank directory as I go. For this lesson, I figured I’d do something just a little bit different.
01:11
What I want to do is go through a quick little project with you to show you the importance, or at least the usefulness, of fileinput
. What I’m going to do—first, after listing and showing you that I have just an empty directory—my goal is to create a crude version of cat
, which is a Linux utility that reads multiple files and displays their contents
01:36
sequentially in stdout
(standard out). You can see how this would be useful because you could combine this with os.listdir()
to get a nice, concise summary of all of the contents of the things in your list.
01:48 Well, it may not be concise, if your files aren’t concise. But what I mean is just that you could look over the contents of all the files in a directory really easily, if you could have this utility.
01:58
This is something that’s very common to use in Linux and in the macOS terminal, and so on. It’s a super-useful utility. So, the fileinput
module can be used to do that, but first I’m going to need to create some files.
02:10
So I’ll say this, I’ll say, flist = ["file1.txt", "file2.txt", "file3.txt"]
, and that should be enough for our purposes. And now I’ll say, for fname in flist:
, I want to say with open(fname)
in write mode as f:
, I will just go and say, f.write
—and I’ll make it an f-string—there’s a lot of f’s going on here, but that’s how it goes sometimes—f"Hi I'm file #"
—oh, and I want to make sure I keep track of the file number, so I’ll use the enumerate pattern here, and I’ll say enumerate(flist)
—enumerate function, I should say.
02:58
And then I can say "I'm file #"
—and I’ll actually need to say "{i + 1}"
to account for zero-indexing. And this should work just great and, in fact, creates all these files.
03:10
They’re in kind of an odd order, but don’t worry about that. That’s just how it goes with your system sometimes. Now, I have some files to test this utility on, and I want to actually say here, basic_cat()
.
03:23
It will take in a file list, and then it will say lines = fileinput.input()
and I have to call it on the file list. And if I don’t put any files in here, then it will actually just use the system arguments.
03:39 So when I call this Python file that I would be actually creating here from the command line, it would use those arguments. But as of now, I’m going to make sure that it uses this file list that’s passed in.
03:51
So now what I can say is for line in lines
, and the first thing I want to do is maybe print out some kind of message to let you know that you’re on a new file.
04:01
Luckily, fileinput
has a nice way to do this which is to say if fileinput.isfirstline()
04:08
then I will print out an f-string that says f"Processing file: "
04:14
and I’ll use here the "{fileinput.filename}"
parameter.
04:21 So, that’ll work well, it will print out that we’re processing a file. And maybe one more thing that I want to do is just give that—I’m just going to give it a newline so that it’s clear, or at least that it’s obvious when you look through this output, that there’s a new file being processed.
04:36
So, now what I can do is I can just print out the line and maybe I’ll have a little thing here, like an arrow that just says that this is a line. So, this is really pretty much all you need for a basic_cat()
function.
04:51
So if I say now, I’ll just try calling basic_cat()
on flist
, luckily I already have that defined. What it does is it says Processing file: file1.txt. -> Hi I'm file #1
, Processing file: file2.txt. -> Hi I'm file #2
, Processing file: file3.txt. -> Hi I'm file #3
.
05:09 And of course, if you look at this output, what I really should have done here is put a newline at the beginning, too, so that it was clear what was actually happening.
05:19 And luckily this does make it a little bit easier. It processes the file, and then it has the lines, processes the file, has the lines. So, that was some awkward spacing, but with an extra newline, it works out just great.
05:31
So now, you have a basically functional version of the Linux utility cat
, all made with fileinput
. And it has several more useful functions in it than just isfirstline()
, input()
, and filename()
, so I encourage you to take a look at those and see how you can incorporate that in your own programming because it can be really useful to have a bunch of files that you want to process all in the same way.
05:52
You can just process them all at once with fileinput.input()
, which is quite convenient. It’s especially convenient if you want to make command-line applications because it defaults to the sys.argv
.
Become a Member to join the conversation.