Welcome to the Pandas tutorial! Pandas is a powerful data manipulation and analysis library in Python. It provides high-performance, easy-to-use data structures and data analysis tools.

What is Pandas?

Pandas is a Python library providing high-performance, easy-to-use data structures and data analysis tools. It is widely used for data manipulation, analysis, and cleaning.

Key Features

  • DataFrames: The primary data structure in Pandas, providing labeled data structures with columns can be interpreted as tables.
  • Time Series: Advanced time series functionality, including time parsing, time shifting, and time rolling.
  • Data Loading: Support for a wide variety of data formats, including CSV, Excel, SQL, and JSON.
  • Data Cleaning: Tools for handling missing data, duplicates, and outliers.

Getting Started

To get started with Pandas, you can install it using pip:

pip install pandas

Once installed, you can import Pandas in your Python script:

import pandas as pd

Data Structures

Pandas provides two main data structures: Series and DataFrame.

Series

A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

import pandas as pd

s = pd.Series([0, 1, 2, 3, 4, 5])
print(s)

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

import pandas as pd

data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 22, 34, 29],
    'City': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)
print(df)

Data Analysis

Pandas provides a wide range of functions for data analysis, such as sorting, filtering, and grouping.

Sorting

df.sort_values(by='Age', ascending=True, inplace=True)
print(df)

Filtering

filtered_df = df[df['Age'] > 25]
print(filtered_df)

Grouping

grouped_df = df.groupby('City')
print(grouped_df.mean())

Further Reading

For more information on Pandas, you can visit the official documentation: Pandas Documentation