Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition
There is a newer edition of this item:
Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.
Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.
- Use the IPython shell and Jupyter notebook for exploratory computing
- Learn basic and advanced features in NumPy (Numerical Python)
- Get started with data analysis tools in the pandas library
- Use flexible tools to load, clean, transform, merge, and reshape data
- Create informative visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- Analyze and manipulate regular and irregular time series data
- Learn how to solve real-world data analysis problems with thorough, detailed examples
- ISBN-101491957662
- ISBN-13978-1491957660
- Edition2nd
- PublisherO'Reilly Media
- Publication dateOctober 24, 2017
- LanguageEnglish
- Dimensions7 x 1.11 x 9.19 inches
- Print length550 pages
Customers who viewed this item also viewed
From the brand
-
Explore more Data Science
-
Start learning with O'Reilly
-
More From O'Reilly
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While 'data analysis' is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis.
New for the Second Edition
The first edition of this book was published in 2012, during a time when open source data analysis libraries for Python (such as pandas) were very new and developing rapidly. In this updated and expanded second edition, I have overhauled the chapters to account both for incompatible changes and deprecations as well as new features that have occurred in the last five years.
I’ve also added fresh content to introduce tools that either did not exist in 2012 or had not matured enough to make the first cut. Finally, I have tried to avoid writing about new or cutting-edge open source projects that may not have had a chance to mature. I would like readers of this edition to find that the content is still almost as relevant in 2020 or 2021 as it is in 2017.
The major updates in this second edition include:
- All code, including the Python tutorial, updated for Python 3.6 (the first edition used Python 2.7)
- Updated Python install instructions for the Anaconda Python Distribution & other Python packages
- Updates for the latest versions of the pandas library in 2017
- A new chapter on some more advanced pandas tools, and some other usage tips
- A brief introduction to using statsmodels and scikit-learn
- Reorganized since from the first edition to make the book more accessible to newcomers.
Editorial Reviews
About the Author
Wes McKinney is a New York?based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He's now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.
Wes was later the co-founder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2016, he joined Two Sigma Investments in New York City, where he continues working to make data analysis faster and easier through open source software.
Product details
- Publisher : O'Reilly Media; 2nd edition (October 24, 2017)
- Language : English
- Paperback : 550 pages
- ISBN-10 : 1491957662
- ISBN-13 : 978-1491957660
- Item Weight : 1.91 pounds
- Dimensions : 7 x 1.11 x 9.19 inches
- Best Sellers Rank: #152,620 in Books (See Top 100 in Books)
- #39 in Data Modeling & Design (Books)
- #76 in Data Processing
- #130 in Python Programming
- Customer Reviews:
About the author
Since 2007, I have been creating fast, easy-to-use data wrangling and statistical computing tools, mostly in the Python programming language. I am best known for creating the pandas project and writing the book Python for Data Analysis. I am also a contributor to the Apache Arrow, Kudu, and Parquet projects within the Apache Software Foundation. I am currently the CTO and Co-founder of Voltron Data, which builds accelerated computing technologies powered by Apache Arrow. I previously worked for Ursa Labs (within RStudio / Posit), Two Sigma, Cloudera, DataPad, and AQR Capital Management.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book provides a good introduction to data analysis using Python. They describe it as an excellent resource with practical content and unique techniques. Many find the content interesting and inspiring, with good chapters on handling time series. The material quality is considered solid, with high-quality paper and colorful graphs. However, opinions differ on the writing style - some find it well-written and simple, while others feel it reads like a dictionary.
AI-generated from the text of customer reviews
Customers find the book's introduction helpful. It teaches them the basics of data analysis using Python and covers most topics they would need to know. The examples are friendly and instructive, making it suitable for beginners.
"...and because while McKinney is not fun to read, he does pack the book with useful information and it is (mostly) well organized...." Read more
"...This book primarily focuses on the pandas Python library, which is awesome at processing and organizing data..." Read more
"This book covers all of the basics that you would want to know to get started in programming in Python for data analysis, as the title implies, but..." Read more
"This book gave me my first job. And I am still learning it. It is simple, talks some general idea why functions design like this, and introduces..." Read more
Customers find the book offers good value for money. They say it's worth buying, especially if you enjoyed the first edition.
"...This book has been well worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey...." Read more
"Great book, though a bit dry and slow...." Read more
"...There are plenty of code examples. So worth the purchase. Only negative I wish there were mini projects to learn from." Read more
"Overall, I liked the book. Between the first and second edition, however, the author reorganized the book that made it harder for me to absorb...." Read more
Customers find the book useful and practical. They appreciate the unique functions and techniques introduced in it, which are useful for applying Python's tools to data science. The examples work as advertised and improve their working methods. The book is a good resource for engine performance engineers.
"...This book has significantly improved how I work. Thanks, Wes and team." Read more
"...some general idea why functions design like this, and introduces some practical functions...." Read more
"...of this book, and I think it accomplished it's goal of being a good general resource for people beginning their career/learning with Python and data..." Read more
"...Its a great book to have as a reference and learning data analysis techniques. There are plenty of code examples. So worth the purchase...." Read more
Customers find the book's content interesting and inspiring. They appreciate the examples and chapters on handling time series.
"So far, this book has been an inspiring reading. It contains a huge number of data cleansing, transformation, analysis & etc. code snippets...." Read more
"Great content. Five star content. But, pages started coming off the binding one day after I got this in the mail...." Read more
"The content is too generic, hope it can be more technological." Read more
"Excellent step-by-step instructions. Interesting examples." Read more
Customers appreciate the book's quality. They find it well-written with high-quality paper and colorful graphics. The coverage is good, especially for Python.
"...copious use of code snippets to illustrate his points makes the material very usable...." Read more
"This product arrived fast. The book was in great shape. Couldn't have asked for a better buying experience" Read more
"I think this book is solid but it was a bit beyond my level...." Read more
"...details about the Pandas library for Python, the author also includes solid sections about the python language and NumPy...." Read more
Customers have different views on the writing style. Some find it well-written and practical, recommending it for beginners and intermediates. They appreciate the colorful graphs and beautiful paper layout. Others feel it reads like a dictionary and is out of date.
"Well written by the creator of Pandas. The author's copious use of code snippets to illustrate his points makes the material very usable...." Read more
"...and rewarding in its use of example datasets, its more personable writing style, and its outlining of good practices for data science." Read more
"...and writes like an impatient person who would rather be doing something else...." Read more
"...bit as detailed as I hoped it would be with a great introduction, great examples, and great coverage of fundamental, basic, and advanced..." Read more
Reviews with images
Poor quality binding but great content.
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
- Reviewed in the United States on November 24, 2021I got this book when I was transitioning to doing data science with Python and was struggling to become familiar with standard tools. It's written by the creator of Pandas, and follows the style of the Pandas documentation: dense, telegraphic, peppered with examples.
It's hard work because Wes McKinney often does not articulate why you would need to do something (assuming you are already knowledgeable on the underlying process), and writes like an impatient person who would rather be doing something else. Additionally examples often suffer from being both too long and too short - too long in that almost every example is on a toy dataset created from scratch, too short in that most of those datasets have only 5 or 10 elements and do not always showcase complex operations. Other examples (particularly involving time series) have an overabundance of data that make the critical results hard to spot. Frankly, my first month with Pandas was a miserable one.
But I give the book 5 stars both because I came to love Pandas as I got more familiar with it, and because while McKinney is not fun to read, he does pack the book with useful information and it is (mostly) well organized. If anything it would benefit from being longer and with a more patient treatment of larger and more concrete datasets (eg the Titanic passenger dataset used in the Pandas documentation). The initial chapter on the basics of using Python could go - if you need this book, then you don't want to be trying to learn the rudiments of Python from it. If you can accept that you'll need a lot of bookmarks or margin notes to get through a rather steep learning curve, it will reward your persistence.
- Reviewed in the United States on April 5, 2019This book has been my foundation of using python as a data analyst.
This book primarily focuses on the pandas Python library, which is awesome at processing and organizing data (Python pandas is like MS Excel times 100. This is not an exaggeration). It also introduces the reader into numpy (lower level number crunching and arrays), matplotlib (data visualizations), scikitlearn (machine learning), and other useful data science libraries. The book contains other book recommendations for continuing education.
Although this would be a challenging book for a brand new Python user, I would still recommend it, especially if you are currently doing a lot of work in MS Excel and/ or exporting data from databases. I had a few false starts learning Python, and my biggest stumbling block was lack of application in what I was learning. This book puts practical tools in the reader's hands very quickly. I personally don't have time to make goofy games etc. that other books have used as practice examples. Despite other reviews criticizing the use of random data throughout the book, I found the examples easy to follow and useful. I would also argue that learning how to generate random data is useful in itself (thus the purpose of the numpy random library), and that there are practical examples throughout the book. Chapter 14 devoted to real-world data analysis examples.
I am almost finished with my second time through the book, this time working through every example. This book has been well worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey. This book has significantly improved how I work.
Thanks, Wes and team.
- Reviewed in the United States on January 26, 2019This book covers all of the basics that you would want to know to get started in programming in Python for data analysis, as the title implies, but it doesn't really offer compelling real-world examples. The data seem to be made up and the analyses don't go into enough detail to help you really learn how pandas and numpy work. Overall this is a decent starter book but you will have to bookmark the python and pandas documentation online if you want to have a reference to all of the functionality those tools have, and there are many places online where you can get better examples to learn from. If you haven't made your mind up about which tool to use for data analysis, I highly recommend checking out dplyr in R, which has an excellent free book online (R for data science, hadley wickham). I find it very easy to learn and it is much easier to set up R and RStudio than it is to set up Python, even though I love Python and Pandas.
- Reviewed in the United States on December 6, 2017This book gave me my first job. And I am still learning it. It is simple, talks some general idea why functions design like this, and introduces some practical functions. Because in real life real job you always need to look up documentation or to google certain functions, I think the idea why Wes makes functions/variables like this, and what he wants to develop in the future is very important. anyway, I think this book is for data analysis beginner and some intermediate users. I learned Python first so I recommend beginners who want to use Python for Data Analyst/Scientist to learn Python Programming first/simultaneously. At least understand lambda and python expressions, otherwise, you can't feel the full magic.
- Reviewed in the United States on February 14, 2022Well written by the creator of Pandas. The author's copious use of code snippets to illustrate his points makes the material very usable. The snippets are short enough to type by hand so you get the frequent opportunity to play with the code and really understand the tools being presented. And Pandas is awesome!
- Reviewed in the United States on August 16, 2024This product arrived fast. The book was in great shape. Couldn't have asked for a better buying experience
- Reviewed in the United States on July 21, 2019So far, this book has been an inspiring reading. It contains a huge number of data cleansing, transformation, analysis & etc. code snippets. The code is very clean and - for the most part - self-explaining (at least, for a seasoned software developer). The book step by step displays the motivations behind the design and functionality of center-piece Python modules - and you would not expect anything less from the original designer of Pandas. I feel this wonderful book being a natural extension of ageless Practical CS classics by Niklaus Wirth, Kernighan-Ritchie, and B. Stroustrup for Data Science Age.
Top reviews from other countries
- hzkReviewed in Canada on October 12, 2022
5.0 out of 5 stars clear and succinct
very clearly explains what I need to know and just the amount of details I want, overall very short chapters but enough to solve my problems
-
Jovial GBA-GOMBOReviewed in France on May 20, 2022
5.0 out of 5 stars Rapide et sûr
Acquisition pour un perfectionnement en tant que Data Analyst
- abhishek patilReviewed in India on April 8, 2021
5.0 out of 5 stars Good packing and perfect book for fresher data analytics profesional
Good packing from the party .I just loved the overall structure of the book and its content.I will say must for all fresher wishing to get into data analytics and data science!!
-
Leonardo D.Reviewed in Mexico on December 24, 2019
5.0 out of 5 stars De lo más completo qué hay
Libro muy completo, a comparación de muchos otros, explica conceptos importantes que son muy útiles para profundizar en muchas técnicas
- Pedro DiasReviewed in Spain on January 8, 2021
5.0 out of 5 stars Must have
You must have this book if you want to learn Pandas and Data Science.