Pandas is a powerful data analysis library in Python. It provides high-performance, easy-to-use data structures and data analysis tools. In this tutorial, we will cover the basics of Pandas, including how to install it, create data frames, and perform data analysis.
Installation
To install Pandas, you can use pip:
pip install pandas
Creating a DataFrame
A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can create a DataFrame from a variety of sources, including lists, dictionaries, and other Pandas objects.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Data Analysis
Pandas provides a wide range of functions for data analysis. Here are some common operations:
- Selecting data: You can select data using
.loc[]
and.iloc[]
. - Filtering data: Use boolean indexing to filter data.
- Grouping data: Group data by a column and perform operations on each group.
- Aggregating data: Use
.sum()
,.mean()
,.max()
, and other aggregation functions.
For more detailed information, please refer to the Pandas documentation.
Example
Let's say you want to find the average age of people living in New York:
average_age = df.loc[df['City'] == 'New York', 'Age'].mean()
print(average_age)
Further Reading
For more advanced topics, you can explore the following tutorials: