Getting File Attributes
00:00 Now you’ve got a list of all the files in your current directory, but what happens when you want to actually get the attributes of those files? Like, the last time they were modified, or when they were created, or their size, or their permissions, or something like that?
00:15 That’s the topic of this lesson.
00:18
Just like in the last lesson, there are three different options for getting file information: two from the os
module, and then one from the pathlib
module.
00:28 And in fact, I say there are three different options, but there are many more than just these. These are just the three that I’m going to show you today.
00:35
The first is os.stat()
, which takes in a path as a string and then returns a stat_result
object that has that file data in it. So that stat_result
object is something where you can access fields of that object and get the file data.
00:52
Then os.scandir()
and pathlib.Path.iterdir()
, as I mentioned in the last lesson, provide this information within the objects that they return in their iterators.
01:04
The objects in those iterators all have .stat()
methods that return the same data as os.stat()
. So again, os.stat()
is something that you can use independent of these iteration paradigms, whereas os.scandir()
and Path.iterdir()
, they package this .stat()
data up with the objects that they actually are iterating through.
01:25
So it’s more convenient when you’re doing iteration, but os.stat()
might be more convenient if you just need a one-off, “What’s in this file?”
01:33 The sample directory that I’ll use for this is the same as in the last lesson. It has a couple of subdirectories with a few files in each, and then has a few files just in the top-level directory.
01:43 So let’s take a look at this over in the REPL. Okay, file data: how do you get it? Well, let’s try this. First, I just want to get the contents of this directory, and as you can see, there are a lot of files to choose from.
01:56
And the first method that I looked at in the slides was the os.stat()
function. That just takes in a file path, a filename, and gives you an os.stat_result
object as the results.
02:10
Now, this stat_result
object has a lot of different fields, and I can’t really get into all of them because there’s many more than are shown here, and it’s just a wealth of information.
02:19 But as you can see, you have the mode, which is the permissions associated with that object. You have a unique identifier for the object. You have all sorts of different things: the number of links, the creation and modification time.
02:30 Some of these will even differ by system. On Windows, on Linux, on Mac, some of these will have different values. You really have to be careful and understand your system well to work with these.
02:41
In order to show you just a little bit about how you can actually access these kinds of things, I’m just going to focus on the st_mtime
, the modification time—last-modified time, I should say—just so that I can have one thing to show you as I go through with these different methods of getting statistics for objects.
02:58
So, as you can see, if you call this, you can then directly access those fields. So I could say st_mtime
, and this is a time represented as floating-point seconds.
03:11 If you’re not familiar with that representation, I would encourage you to check out the Real Python tutorials on the topic. But the gist of it is, I’ll have to convert that if I want a more readable time.
03:21
I’d have to say something like time.ctime(os.stat("file1.py").st_mtime)
. So, there’s a fair amount of work that you’ll need to do here, if you want to actually convert these into nice, readable, human-intelligible formats.
03:36 That’s just because this data is stored, really, at the system level, where everything is just bare numbers.
03:43 Now that you know how to, at least, get these things and convert them, a little bit, into a more readable format, I’ll show you the other two ways that I want to talk about to actually get this data.
03:52
I could say for obj in os.scandir()
on the current directory, and then let’s make it a little bit more informative output. Let’s say something like mod_time = obj.stat()
, that’s how you get the statistics for a scandir()
object.
04:10
And then I can say .st_mtime
, and now I can say something nice, like f"File {obj.name} was last modified at"
, and then I could say, time.ctime()
of the mod_time
.
04:29 And so this should give us everything in a nice readable format. And in fact, you can see that that’s what it does. It tells you each filename and the time that they were last modified.
04:39 And as you can see, most of them were actually at the same time, because I created these all for the purposes of this lesson, but then one of them I modified just recently to give at least some variety.
04:48
Then the pathlib.Path()
option here works almost exactly the same way, except you have to create the Path
object first, before you can start iterating through it.
04:59
So I create the Path
object, and then I can really pretty much just copy and paste. I just have to call the .iterdir()
function on the dir_path
object.
05:08
So in reality, what I’m doing is I’m calling the .iterdir()
method of the dir_path
object, would be a better way to say that. But this does exactly the same thing and it even has the same method names, so that’s super convenient for when you’re working with these.
05:21 Really, these two function in almost exactly the same way. So, that’s three different ways to get statistics from files using Python. In the next lesson, I’ll cover something I haven’t covered yet, which is how to create directories.
Liam Pulsifer RP Team on Aug. 25, 2020
Glad to hear you got something out of this lesson @patientwriter! Best wishes for your continuing Python journey :)
patientwriter on Nov. 20, 2020
Hey Liam! Are you still out there? I have an issue I’d like some help with:
I am practicing my nlp on some old files. I have the date in the filename, but in the old days I was not using ISO. I noticed that after my first processing step, the ‘last modified’ date reported by Ubuntu under ‘properties’ on the file was changed. What I want to preserve is the actual creation date, so I used your code to fish that out and save it. On this test file, Ubuntu file properties says:
100211 pot luck.rtf
Accessed: Tue 10 Nov 2020 11∶05∶24 PM CST
Modified: Sun 02 Oct 2011 10∶27∶55 AM CDT
But using your code:
>>> pd = datetime.fromtimestamp(ppp.stat().st_ctime)
>>> pd
datetime.datetime(2019, 12, 4, 18, 57, 46, 765142)
>>> str(pd)
'2019-12-04 18:57:46.765142'
str(pd) != Modified!!!
So I checked atime and mtime:
>>> ppp.stat()
os.stat_result(st_mode=33152, st_ino=8135504, st_dev=2057, st_nlink=1, st_uid=1000, st_gid=1000, st_size=11856, st_atime=1605071124, st_mtime=1317569275, st_ctime=1575507466)
>>> pda = datetime.fromtimestamp(ppp.stat().st_atime)
>>> str(pda)
'2020-11-10 23:05:24.853579'
>>> pdm = datetime.fromtimestamp(ppp.stat().st_mtime)
>>> str(pdm)
'2011-10-02 10:27:55.140625'
As you can see, pdm
matches the date in the filename: 100211 pot luck.rtf, so that is almost certainly the actual creation date.
I have no idea where the pda comes from. I haven’t touched this file in years, and certainly not 10 days ago! But it does match what Ubuntu reports for the last accessed time. The same goes for the ctime, which clearly is not the creation time.
All these files are in folders named ‘YYYY’. I randomly (manually) checked a few for 2011, 2012, and 2013. Ubuntu says they all have last accessed dates of:
Tue 10 Nov 2020 11∶05∶31 PM CST
I would think the only way for that to be true would be if I had moved all these folders at the same time. But ‘moving’ isn’t ‘accessing, is it? More importantly, I did not move these files! I am not saying ‘I forgot’ or ‘I don’t remember’. I’m saying it didn’t happen. I’d remember something like that, especially since, as I said, I am using these files to learn nlp.
I even went so far as to dig up my apt history log to see if I had an update around 10 days ago that may have rewritten all the file metadata,, but no such luck.
My computer clock / time is correct.
What is going on here? Thanks.
Become a Member to join the conversation.
patientwriter on Aug. 25, 2020
I have been using
os.stat()
. Did not know about the other two. Great info! Thanks.