Welcome to the Pandas tutorial! Pandas is a powerful data manipulation and analysis library in Python. It provides high-performance, easy-to-use data structures and data analysis tools.
What is Pandas?
Pandas is a Python library providing high-performance, easy-to-use data structures and data analysis tools. It is widely used for data manipulation, analysis, and cleaning.
Key Features
- DataFrames: The primary data structure in Pandas, providing labeled data structures with columns can be interpreted as tables.
- Time Series: Advanced time series functionality, including time parsing, time shifting, and time rolling.
- Data Loading: Support for a wide variety of data formats, including CSV, Excel, SQL, and JSON.
- Data Cleaning: Tools for handling missing data, duplicates, and outliers.
Getting Started
To get started with Pandas, you can install it using pip:
pip install pandas
Once installed, you can import Pandas in your Python script:
import pandas as pd
Data Structures
Pandas provides two main data structures: Series and DataFrame.
Series
A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).
import pandas as pd
s = pd.Series([0, 1, 2, 3, 4, 5])
print(s)
DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
import pandas as pd
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 22, 34, 29],
'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
print(df)
Data Analysis
Pandas provides a wide range of functions for data analysis, such as sorting, filtering, and grouping.
Sorting
df.sort_values(by='Age', ascending=True, inplace=True)
print(df)
Filtering
filtered_df = df[df['Age'] > 25]
print(filtered_df)
Grouping
grouped_df = df.groupby('City')
print(grouped_df.mean())
Further Reading
For more information on Pandas, you can visit the official documentation: Pandas Documentation