Working With SQL
00:00 Working with SQL files. pandas IO Tools can also read and write databases. In this next example, you’ll write your data to a database, but to get started, you’ll need to install the SQLAlchemy package, as seen onscreen now.
In addition, you’ll also need a database driver. Fortunately, Python has a built-in driver for SQLite databases, so that’s what we’ll be using. Once you have SQLAlchemy installed, import
create_engine() and create a database engine as seen onscreen.
Now that you have all that set up, the next step is to create a
DataFrame object as seen onscreen.
.astype() is a convenient method you can use to set multiple data types all at once using a dictionary as seen onscreen. Now the DataFrame can be saved to a database using the
.to_sql() method as seen onscreen.
con is used to specify the database connection or engine that you want to use. The optional parameter
index_label specifies how to call the database column with the row labels.
You’ll often see it take on the value
'ID'. This code had produced the file
data.db, which can be read with an SQL file viewer.
The first column contains the row labels. The other columns correspond to the columns of the DataFrame. To omit writing row labels into the database, pass
index=False to the
.to_sql() method, and this will lead to the database seen onscreen now. There are a few more optional parameters, for example, you can use
schema to specify the database schema and
dtype to determine the types of the database columns. You can also use
if_exists, which says what to do if a database with the same name and path already exists.
if_exists='fail' raises a
ValueError and is the default.
if_exists='replace' would drop the table and insert new values.
if_exists='append' inserts new values into the table. You can load the data from the database using
index_col specifies the name of the column with the row labels. Note that this inserts an extra row after the header that starts with
03:26 You can fix this behavior with the following line of code.
Now you have the same
DataFrame object has before. Note that the content for Russia is now
None instead of
nan. If you want to fill in the missing values with
nan, then you can use
.fillna() as seen onscreen.
.fill_na() replaces all missing values with whatever you pass to
value. Here, you passed
float('nan'), which says to fill all missing values with
Also note that you didn’t have to pass
read_sql(). That’s because the database was able to detect that the last column contains dates. However, you could pass
parse_dates if you’d like, but you’d still get the same results.
@KatMac I think you want to transpose the data frame after having converted column types. Otherwise, the columns will have names corresponding to the consecutive numbers 0, 1, 2, and so on.
Try this instead:
db_df = pd.DataFrame(data=data).astype(dtype=db_types).T
Thank you Bartosz…
In my code, I have updated the db_df line as you suggested. The previous error has disappeared. When I run the code now the following displays (partial display to save space)
0 object 1 object 2 object 3 object 4 object .... 19 object dtype: object
Hi Bartosz, thanks for the lecture. Could you please explain what does ‘echo=false’ mean? Thanks a lot.
@Cindy The lecture and the whole course were made by Darren, while I only commented on it 😊
The best place to learn about the parameters of a function is the official documentation. In this case, you want to take a look at SQLAlchemy’s documentation.
You’ll find the following description there:
if True, the Engine will log all statements as well as a repr() of their parameter lists to the default log handler, which defaults to sys.stdout for output. If set to the string “debug”, result rows will be printed to the standard output as well. The echo attribute of Engine can be modified at any time to turn logging on and off; direct control of logging is also available using the standard Python logging module. (Source)
Become a Member to join the conversation.
KatMac on Sept. 23, 2021
Hi I have typed in the following code.....
and am receiving the following error:
What have I done wrong? and Why did we not assign the COUNTRY type to text in db_types?