Part I

Big Countries (筛选)

A country is big if:一个国家如果很大：

it has an area of at least three million (i.e., 3000000 km2), or
面积至少达到 300万（即3000000平方公里），或

it has a population of at least twenty-five million (i.e., 25000000).
人口至少有两千五百万（即25000000）。

Write a solution to find the name, population, and area of the big countries.写出求大国名称、人口和面积的解法。

Example 1: 

Input: 
World table:
+-------------+-----------+---------+------------+--------------+
| name        | continent | area    | population | gdp          |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |
+-------------+-----------+---------+------------+--------------+
Output: 
+-------------+------------+---------+
| name        | population | area    |
+-------------+------------+---------+
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |
+-------------+------------+---------+

Code:
import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    df = world[(world['area'] >= 3000000) | (world['population'] >= 25000000)]
    return df[['name','population','area']]

Recyclable and Low Fat Products - 可回收和低脂产品 (筛选)

Write a solution to find the ids of products that are both low fat and recyclable.写一个解决方案，找出既低脂又可回收的产品 ID。

Return the result table in any order.按任意顺序返回结果表。

Example 1:

Input: 
Products table:
+-------------+----------+------------+
| product_id  | low_fats | recyclable |
+-------------+----------+------------+
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |
+-------------+----------+------------+
Output: 
+-------------+
| product_id  |
+-------------+
| 1           |
| 3           |
+-------------+
Explanation: Only products 1 and 3 are both low fat and recyclable.

Code1:
import pandas as pd

def find_products(products: pd.DataFrame) -> pd.DataFrame:
    products_new = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')]
    return products_new[['product_id']]

Customers Who Never Order - 从不订购的客户 (重名, 筛选)

Write a solution to find all customers who never order anything.写一个解决方案，找出所有从不订购任何东西的客户。

Return the result table in any order.按任意顺序返回结果表。

Approach 方法

使用isin()方法确定客户DataFrame中的每个 id 是否存在于订单DataFrame中。

应用否定运算符(~)选择 ID 不在订单列表中的客户。

筛选后，只选择客户 DataFrame 中的姓名列

将列名称重命名为 "客户"，使结果更加清晰。

Input: 
Customers table:
+----+-------+
| id | name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+
Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+
Output: 
+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+

Code1：
import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    filtered_customers = customers[~customers['id'].isin(orders['customerId'])]
    result = filtered_customers[['name']].rename(columns={'name': 'Customers'})
    return result

Article View I - 文章观点 I (排序, 合并)

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
该表的每一行都表示某位读者在某日浏览了某篇文章（由某位作者撰写）。 
请注意， author_id 和 viewer_id 表示同一个人。

写一个解决方案，找出所有至少浏览过一篇自己文章的作者返回按id升序排序的结果表结果格式如下

Example 1:

Input: 
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+
Output: 
+------+
| id   |
+------+
| 4    |
| 7    |
+------+

Code1:
import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:
		# 筛选出 author_id 和 viewer_id 相等的行
    author_viewed_own_articles = views[views['author_id'] == views['viewer_id']]
		# 提取唯一作者：从上表中提取唯一的 author_id
    unique_author = author_viewed_own_articles['author_id'].unique() 
		# 排序：对唯一作者 ID 进行排序
    unique_author = sorted(unique_author)
		# 创建并返回结果
    result_df = pd.DataFrame({'id': unique_author})
    
    return result_df

Invalid Tweets - 无效推文 (过滤字符长度)

Table: Tweets
+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| tweet_id       | int     |
| content        | varchar |
+----------------+---------+
tweet_id is the primary key (column with unique values) for this table.
This table contains all the tweets in a social media app.

1. 编写一个解决方案，找出无效推文的 ID。如果推文内容中使用的字符数严格大于 15，则该推文无效
2. 按任意顺序返回结果表
3. 结果格式如下

Example 1:

Input: 
Tweets table:
+----------+----------------------------------+
| tweet_id | content                          |
+----------+----------------------------------+
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |
+----------+----------------------------------+
Output: 
+----------+
| tweet_id |
+----------+
| 2        |
+----------+
Explanation: 
Tweet 1 has length = 14. It is a valid tweet.
Tweet 2 has length = 32. It is an invalid tweet.

import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
   
   invalid_tweets_df = tweets[tweets['content'].str.len() > 15]
   
   retult_df = invalid_tweets_df[['tweet_id']]

   return retult_df

Calculate Special Bonus - 计算特殊奖金（排序, 条件计算）

.sort_values(by='employee_id', ascending=True)：

sort_values() 是一个用于排序 DataFrame 的 Pandas 函数。
by='employee_id' 指定了要根据哪一列的值进行排序，在这个例子中是 employee_id 列。
ascending=True 表示按升序排序。升序意味着从小到大排列，如果设为 False，则表示降序排列。

.str.startswith('M')

Table: Employees
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| employee_id | int     |
| name        | varchar |
| salary      | int     |
+-------------+---------+

employee_id 是该表的主键（具有唯一值的列）。
该表的每一行都注明了员工 ID、员工姓名和工资。

请写出计算每位员工奖金的解决方案。如果员工的 ID 是奇数，且员工姓名不是以 "M " 开头，则该员工的奖金为其工资的100%。否则奖金为0

返回按employee_id 排序的结果表

结果格式如下

Example 1:

Input: 
Employees table:
+-------------+---------+--------+
| employee_id | name    | salary |
+-------------+---------+--------+
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |
+-------------+---------+--------+
Output: 
+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |
+-------------+-------+
Explanation:
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with 'M'.
The rest of the employees get a 100% bonus.

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    # Create a new column 'bonus' with default value 0
    # 创建一个新列，其默认值为0
    employees['bonus'] = 0
    
    # Calculate bonus based on the conditions
    # 基于条件计算奖金
    employees.loc[(employees['employee_id'] % 2 != 0) & (~employees['name'].str.startswith('M')), 'bonus'] = employees['salary']
    
    # Select only the required columns and sort the result table by employee_id in ascending order
    # 只选择所需列，并按 employee_id 升序对结果表排序
    result_df = employees[['employee_id', 'bonus']].sort_values(by='employee_id', ascending=True)
    
    return result_df

Fix Names in a Table - 修复表中的名称 (排序, 大小写转换)

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| name           | varchar |
+----------------+---------+
# user_id 是该表的主键（具有唯一值的列）。
# 该表包含用户的 ID 和名称。名称只能由小写和大写字母组成

# 请写出一个解决方案，修正名称，使第一个字符为大写，其余为小写。
# 返回按user_id 排序的结果表
Example 1:

Input: 
Users table:
+---------+-------+
| user_id | name  |
+---------+-------+
| 1       | aLice |
| 2       | bOB   |
+---------+-------+
Output: 
+---------+-------+
| user_id | name  |
+---------+-------+
| 1       | Alice |
| 2       | Bob   |
+---------+-------+

Code1:
import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    # .str.capitalize()： pandas 提供的字符串操作方法。它会将 name 列的每个字符串的第一个字符变为大写，剩余字符变为小写
    users['name'] = users['name'].str.capitalize()
    # 按照 user_id 列升序排序
    result_df = users.sort_values(by = 'user_id', ascending=True)

Find Users With Valid E-Mails - 查找具有有效电子邮件的用户 (正则条件筛选)

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
| mail          | varchar |
+---------------+---------+
# user_id 是该表的主键（具有唯一值的列）。
# 本表包含网站注册用户的信息。部分电子邮件无效。

# 找出拥有有效电子邮件的用户
# 一个有效的电子邮件有一个前缀名和一个域：
# 前缀名称是一个字符串，可包含字母（大写或小写）、数字、下划线"_"、句号" . "和/或破折号"-" 。前缀名称必须以字母开头。
# 域名为"@leetcode.com"

Example 1:

Input: 
Users table:
+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | [email protected]    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | [email protected]     |
| 4       | Sally     | [email protected] |
| 5       | Marwan    | quarz#[email protected] |
| 6       | David     | [email protected]       |
| 7       | Shapiro   | [email protected]     |
+---------+-----------+-------------------------+
Output: 
+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | [email protected]    |
| 3       | Annabelle | [email protected]     |
| 4       | Sally     | [email protected] |
+---------+-----------+-------------------------+

import pandas as pd
'''
contains(): 这是 pandas 提供的一个字符串方法，用于检查
每个元素（字符串）是否包含指定的模式或子串。它返回一个布尔
值的 Series，其中每个元素为 True 或 False，表示相应的字
符串是否匹配
'''


def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
    # 定义正则表达式
    regex_pattern = r'^[a-zA-Z][a-zA-Z0-9._-]*@leetcode\.com$'
    # 过滤符合条件的行
    filtered_user = users[users['mail'].str.contains(regex_pattern, regex=True)]

    return filtered_user

Patients With a Condition - 患有某种疾病的患者 (正则条件筛选)

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| patient_id   | int     |
| patient_name | varchar |
| conditions   | varchar |
+--------------+---------+
patient_id 是该表的主键（具有唯一值的列）
包含 0 个或多个用空格分隔的代码。
该表包含医院病人的信息

请写出一个解决方案，以查找 I 型糖尿病患者的 patient_id、patient_name 和病情。
I 型糖尿病总是以DIAB1前缀开头
按任意顺序返回结果表

Example 1:

Input: 
Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 1          | Daniel       | YFEV COUGH   |
| 2          | Alice        |              |
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |
| 5          | Alain        | DIAB201      |
+------------+--------------+--------------+
Output: 
+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 | 
+------------+--------------+--------------+

import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    patients_df = patients[patients['conditions'].str.contains(r'\bDIAB1')]
    return patients_df

PreviousProjects of Pandas NextPart II

Last updated 1 year ago

Was this helpful?