Data Science

100 Prompts

4.7/5(11)

Copy, paste and use them in your favorite AI:

OpenAI
Claude
Gemini
DeepSeek
Grok
Qwen
Claude Code
Cursor
Antigravity
NotebookLM
OpenAI
Claude
Gemini
DeepSeek
Grok
Qwen
Claude Code
Cursor
Antigravity
NotebookLM

What you'll achieve

—Manipulate data with Pandas and SQL
—Visualize aesthetically with Matplotlib
—Train regression and clustering models
—Build Streamlit dashboards
—Do storytelling and prep interviews

Launch price

$7.90$15.80USD

Just $0.08 per prompt · one-time payment

100 prompts

Lifetime

All AIs

+9,500 professionals already use GoPrompts AI

See examples

Collection Content

100 resources included

Extracción de patrones con expresiones regulares

Optimización de memoria en DataFrames

Agregaciones personalizadas mediante funciones Lambda

Manejo eficiente de datos categóricos

Pivotado dinámico de series temporales

Tratamiento de índices multinivel jerárquicos

Interpolación de valores nulos numéricos

Vectorización de cálculos matemáticos pesados

Filtrado avanzado por condiciones booleanas

Uniones complejas de múltiples fuentes

About this collection

This definitive collection of AI prompts for Data Science has been specifically designed to transform professionals and students into high-performing experts. Through a meticulous structure, this library covers everything from technical data manipulation to strategic communication of findings, allowing you to automate complex workflows and increase the accuracy of your predictive models in record time. By integrating these prompts into your workflow, you will gain an immediate competitive advantage in the job market. Each instruction is optimized to generate clean code, rigorous statistical analysis, and impactful visualizations, ensuring that every stage of your data pipeline meets the most demanding standards in today's technology industry.

Prompt sample

Prompt #1

Efficient management of categorical data

First, perform a thorough diagnostic of the [lista_columnas_categoricas] columns using memory profiling methods to compare the 'object' data type against the 'category' type. Explains in detail how the Pandas Categorical class's dictionary-based storage reduces the memory footprint and improves the speed of groupby and filtering operations compared to raw text strings.

Second, implement a differentiated strategy depending on the cardinality of the data. For variables with low cardinality, apply One-Hot Encoding techniques using pd.get_dummies or Scikit-Learn, ensuring the elimination of the first column to avoid the trap of multicollinearity. For columns with high cardinality like [columna_alta_cardinalidad], implement Target Encoding or Frequency Encoding, carefully handling potential data leakage by using cross-validation or smoothing.

Third, it addresses the management of unseen categories and null values. The script should be able to proactively assign an 'Unknown' category and transform the columns to be compatible with Machine Learning algorithms that do not accept non-numeric values. Include a benchmarking section where you measure the execution time of a complex aggregation operation before and after data type optimization.

Finally, it generates a reusable function called [nombre_funcion_limpieza] that automates this entire flow, allowing the cardinality threshold to be parameterized to decide the encoding method and that returns a comparative report of the memory savings in megabytes (MB) and percentage (%). Make sure the code follows PEP8 best practices and is properly documented with docstrings.

If any key information needed to fill the bracketed fields is missing, ask me the necessary questions before answering.

Open in:

Frequently asked questions

How do I receive the prompts?

Instant access after purchase from your dashboard. Just copy and paste into your AI.

Which AI do they work with?

ChatGPT, Claude, Gemini, DeepSeek, Grok, Qwen and any AI chat.

Can I adapt them to my case?

Yes. Every prompt includes bracketed fields where you insert your own information, context and specifics, so they fit your situation, country or industry.

Can I see them before buying?

Yes. Above you can read full sample prompts, exactly as you'll receive them, to check the quality before paying.

Is it a one-time payment?

Yes. Pay once and they're yours forever, updates included.

Launch offer

Unlock every collection

For $9.90 USD/month you get 42300+ prompts across 422 collections. Cancel anytime.

Launch price

$7.90$15.80USD

Just $0.08 per prompt · one-time payment

100 Prompts ready to use

Lifetime updates

Compatible with multiple languages

Works with 6 AIs

Acts as a Senior Data Scientist with specialization in data engineering and performance optimization in Python. Your objective is to develop an extremely optimized Pandas script for the treatment of categorical variables in the [nombre_del_dataset] dataset, which presents scalability challenges and high RAM memory consumption. First, perform a thorough diagnostic of the [lista_columnas_categoricas] columns using memory profiling methods to compare the 'object' data type against the 'category' type. Explains in detail how the Pandas Categorical class's dictionary-based storage reduces the memory footprint and improves the speed of groupby and filtering operations compared to raw text strings. Second, implement a differentiated strategy depending on the cardinality of the data. For variables with low cardinality, apply One-Hot Encoding techniques using pd.get_dummies or Scikit-Learn, ensuring the elimination of the first column to avoid the trap of multicollinearity. For columns with high cardinality like [columna_alta_cardinalidad], implement Target Encoding or Frequency Encoding, carefully handling potential data leakage by using cross-validation or smoothing. Third, it addresses the management of unseen categories and null values. The script should be able to proactively assign an 'Unknown' category and transform the columns to be compatible with Machine Learning algorithms that do not accept non-numeric values. Include a benchmarking section where you measure the execution time of a complex aggregation operation before and after data type optimization. Finally, it generates a reusable function called [nombre_funcion_limpieza] that automates this entire flow, allowing the cardinality threshold to be parameterized to decide the encoding method and that returns a comparative report of the memory savings in megabytes (MB) and percentage (%). Make sure the code follows PEP8 best practices and is properly documented with docstrings. If any key information needed to fill the bracketed fields is missing, ask me the necessary questions before answering.

Acts as an expert Senior Data Scientist and Data Architect specialized in optimizing processing pipelines with Python. Your mission is to develop an advanced and highly efficient script using the Pandas library to perform complex custom aggregations on a large-scale data set hosted on [File_Name_or_Source]. The goal is not simply to apply basic statistical functions, but to design sophisticated aggregation logic through the use of Lambda expressions within methods such as .groupby() and .agg(), allowing you to extract insights that are not possible with predefined functions. The dataset contains critical information about [Describe Nature of Data, e.g.: banking transactions, telemetry logs, e-commerce user behavior] and presents specific challenges such as the presence of null values, extreme outliers and heterogeneous data types. You must implement a Lambda function that performs a [Specify Complex Metric, e.g.: weighted Gini index, seasonally adjusted conversion ratio, or probabilistic churn calculation] calculation for each group defined by the [Group_Criteria_Column] column. The logic must be able to handle internal conditions (if-else) and vectorized NumPy operations within the Lambda itself to maximize computational performance in memory-limited environments. In addition to code development, a deep technical analysis on the efficiency of the proposed solution is required. Compares using the Lambda function with alternatives such as using .apply() or native vectorized functions, explaining when each approach is preferable in terms of Python overhead versus C execution speed. Provides recommendations for optimizing memory consumption by downcasting numeric data types and using categorical types in grouping columns before running aggregation operations. The final result must be delivered as a modular script, documented under PEP 8 standards, and ready to be integrated into a production workflow in [Deployment_Environment, e.g.: AWS Glue, Azure Databricks or local environment]. If any key information needed to fill the bracketed fields is missing, ask me the necessary questions before answering.

Acts as a Senior Data Scientist and Data Engineering expert specialized in the PyData ecosystem. Your objective is to design a Python script using the Pandas library to perform dynamic and advanced pivoting of time series, transforming data structures from 'long' format to 'wide' format in an efficient and scalable way. The central problem is to process a data set called [nombre_del_dataset] that contains high-frequency metrics captured from multiple sources or sensors. The original DataFrame has the columns [columna_tiempo], [columna_identificador] and [columna_valor]. It is imperative that the pivot process not only reshapes the data, but also intelligently handles duplicate indexes using a custom aggregation function defined as [funcion_agregacion], which must be able to handle null values (NaN) using the [estrategia_imputacion] strategy. Additionally, the solution must integrate a dynamic resampling phase. Before or during pivoting, the data must be grouped into time intervals of [frecuencia_temporal] (for example, '5min', '1H', 'D'). You must ensure that the resulting index is a clean DatetimeIndex and that there are no time gaps; To do this, use the 'reindex' or 'asfreq' method to complete the missing periods in the range between [fecha_inicio] and [fecha_fin]. Optimizes code performance considering that the data volume can exceed [millones_de_filas] records. Implements the use of categories (Categorical Data) for the [columna_identificador] column in order to reduce RAM usage. The final script should include a validation section that verifies the shape integrity of the resulting DataFrame and generates a quick statistical summary of the pivoted columns to detect anomalies immediately. Finally, it provides documented code following PEP 8 standards, including detailed comments on why using 'pd.pivot_table' is preferred over 'df.pivot' in production scenarios with noisy data, and how Pandas vectorization improves processing speed compared to traditional iterative loops. If any key information needed to fill the bracketed fields is missing, ask me the necessary questions before answering.

Data Science

What you'll achieve

Collection Content

Advanced Manipulation with Pandas

Aesthetic Visualization with Matplotlib

Professional Dataset Cleaning

Linear Regression Algorithms

Classification in Machine Learning

Unsupervised Clustering and Segmentation

SQL Queries for Analysts

Interactive Dashboards with Streamlit

Storytelling and Data Narrative

Technical Interview Preparation

About this collection

Prompt sample

Efficient management of categorical data

Frequently asked questions

How do I receive the prompts?

Which AI do they work with?

Can I adapt them to my case?

Can I see them before buying?

Is it a one-time payment?

Unlock every collection

My Cart

My Cart

Data Science

What you'll achieve

Collection Content

Advanced Manipulation with Pandas

Aesthetic Visualization with Matplotlib

Professional Dataset Cleaning

Linear Regression Algorithms

Classification in Machine Learning

Unsupervised Clustering and Segmentation

SQL Queries for Analysts

Interactive Dashboards with Streamlit

Storytelling and Data Narrative

Technical Interview Preparation

About this collection

Prompt sample

Efficient management of categorical data

Frequently asked questions

How do I receive the prompts?

Which AI do they work with?

Can I adapt them to my case?

Can I see them before buying?

Is it a one-time payment?

Unlock every collection

Custom aggregations using Lambda functions

Dynamic pivoting of time series

You might also like

Administrative and directive management

3D printing

Excel

What AI generates with these prompts

What is a prompt?

What customers say

Does the job

Useful

Highly recommended

Top quality

Excellent

Useful