← Back to Skills Library

Pandas

Information Technology > Business intelligence and data analysis

Description

Pandas is a powerful Python library used for data manipulation and analysis. It provides data structures and functions needed to manipulate structured data, including functions for reading and writing data in various formats like CSV, Excel, SQL databases, and more. With Pandas, you can filter and sort data, handle missing data, merge and reshape datasets, apply mathematical operations, and perform aggregations. Advanced features include handling time series data, creating pivot tables, and data visualization. As you gain proficiency, you can optimize performance, extend Pandas' functionality, and integrate it with other libraries like NumPy and Matplotlib.

Stack

Python

Expected Behaviors

LEVEL 1

Fundamental Awareness

At this level, individuals have a basic understanding of what Pandas is and its uses. They are familiar with the primary data structures in Pandas, such as Series and DataFrame. They can import the Pandas library and create a simple DataFrame.

🌱
LEVEL 2

Novice

Novices can load data from various file formats into a DataFrame and inspect it using methods like head, tail, and describe. They have basic data manipulation skills, including sorting, filtering, and adding/removing columns. They also know how to handle missing data.

🌍
LEVEL 3

Intermediate

Intermediate users can perform more complex data manipulations, such as merging, joining, and reshaping data. They understand how to apply functions to data and group and aggregate it. They can handle time series data and use string methods and regular expressions in Pandas.

LEVEL 4

Advanced

Advanced users can use advanced indexing techniques and perform advanced data cleaning tasks. They understand how to optimize performance in Pandas and use it for data visualization. They can use advanced features like pivot tables, crosstab, rolling and expanding windows.

🏆
LEVEL 5

Expert

Experts have a deep understanding of how Pandas works under the hood. They can write efficient code using Pandas and use it in combination with other libraries. They know how to extend Pandas by defining custom functions or subclasses. They can troubleshoot and solve complex problems using Pandas.

Micro Skills

LEVEL 1

Fundamental Awareness

Knowledge of the purpose of Pandas
Familiarity with the types of tasks Pandas can be used for
Understanding of how Pandas fits into the data analysis workflow
Understanding of what a Series is
Understanding of what a DataFrame is
Knowledge of the differences between Series and DataFrame
Knowledge of the correct syntax to import Pandas
Understanding of Python's import statement
Ability to troubleshoot common issues when importing libraries
Understanding of the syntax to create a DataFrame
Ability to create a DataFrame from a list or dictionary
Knowledge of how to specify column names when creating a DataFrame
Understanding of how to view the created DataFrame
🌱
LEVEL 2

Novice

Understanding of how to use read_csv, read_excel, read_sql functions
Knowledge of handling different delimiters, column specifications, and other parameters while reading files
Ability to handle errors during data loading
Knowledge of using head and tail functions to view first and last n rows
Understanding of how to use the describe function to generate descriptive statistics
Ability to use info and dtypes to check data types of columns
Understanding of how to sort data based on one or more columns
Ability to filter data based on conditions
Knowledge of how to add new columns to a DataFrame
Understanding of how to drop columns from a DataFrame
Understanding of how to identify missing data using isnull or notnull
Ability to remove rows or columns with missing data using dropna
Knowledge of how to fill missing data using fillna
Understanding of how to interpolate missing values
🌍
LEVEL 3

Intermediate

Knowledge of syntax and parameters of merge function
Knowledge of syntax and parameters of join function
Understanding of syntax and parameters of concat function
Understanding of syntax and parameters of melt function
Understanding of syntax and parameters of pivot function
Knowledge of syntax and parameters of stack function
Understanding of syntax and parameters of unstack function
LEVEL 4

Advanced

Ability to create MultiIndex
Ability to modify MultiIndex
Knowledge of how to select data using MultiIndex
Understanding of other index types (DatetimeIndex, PeriodIndex, CategoricalIndex)
Understanding of how to detect and remove duplicates
Ability to replace values in a DataFrame
Knowledge of how to normalize data
Understanding of how to handle outliers
Knowledge of how to use efficient data types
Ability to use vectorized operations instead of loops
Understanding of how to avoid chaining operations
Knowledge of how to use the 'inplace' parameter correctly
Understanding of how to create basic plots (line, bar, scatter, histogram)
Ability to customize plots (title, labels, legend)
Knowledge of how to save plots to file
Understanding of how to create more complex plots (boxplot, heatmap, pairplot)
Understanding of how to create and manipulate pivot tables
Ability to use the crosstab function to create frequency tables
Knowledge of how to calculate rolling statistics
Understanding of how to use expanding windows for cumulative calculations
🏆
LEVEL 5

Expert

Understanding of the underlying data structures used by Pandas (NumPy arrays, Python dictionaries)
Knowledge of how indexing is implemented in Pandas
Understanding of how operations are vectorized in Pandas
Familiarity with the source code of Pandas
Proficiency in using vectorized operations instead of loops
Understanding of how to use the 'inplace' parameter to save memory
Knowledge of how to use methods like 'eval' and 'query' for efficient computations
Ability to use categorical data to improve performance
Ability to use NumPy functions on Pandas objects
Understanding of how to plot data from Pandas objects using Matplotlib or Seaborn
Knowledge of how to use Pandas together with Scikit-learn for machine learning tasks
Ability to use Pandas with statsmodels for statistical analysis
Ability to define custom aggregation functions
Understanding of how to subclass DataFrame or Series
Knowledge of how to extend Pandas with custom dtypes or extension arrays
Ability to define custom accessors
Proficiency in debugging Pandas code
Ability to find and fix performance issues
Understanding of how to handle edge cases in data manipulation tasks
Knowledge of how to deal with issues related to missing or inconsistent data

Skill Overview

  • Expert2 years experience
  • Micro-skills74
  • Roles requiring skill3

Sign up to prepare yourself or your team for a role that requires Pandas.

LoginSign Up