Part I
Big Countries (筛选)
A country is big if:一个国家如果很大:
it has an area of at least three million (i.e.,
3000000 km2), or面积至少达到 300
万(即3000000平方公里),或
it has a population of at least twenty-five million (i.e.,
25000000).人口至少有两千五百万(即
25000000)。
Write a solution to find the name, population, and area of the big countries.写出求大国名称、人口和面积的解法。
Example 1:
Input:
World table:
+-------------+-----------+---------+------------+--------------+
| name | continent | area | population | gdp |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
| Albania | Europe | 28748 | 2831741 | 12960000000 |
| Algeria | Africa | 2381741 | 37100000 | 188681000000 |
| Andorra | Europe | 468 | 78115 | 3712000000 |
| Angola | Africa | 1246700 | 20609294 | 100990000000 |
+-------------+-----------+---------+------------+--------------+
Output:
+-------------+------------+---------+
| name | population | area |
+-------------+------------+---------+
| Afghanistan | 25500100 | 652230 |
| Algeria | 37100000 | 2381741 |
+-------------+------------+---------+Code:
import pandas as pd
def big_countries(world: pd.DataFrame) -> pd.DataFrame:
df = world[(world['area'] >= 3000000) | (world['population'] >= 25000000)]
return df[['name','population','area']]Recyclable and Low Fat Products - 可回收和低脂产品 (筛选)
Write a solution to find the ids of products that are both low fat and recyclable.写一个解决方案,找出既低脂又可回收的产品 ID。
Return the result table in any order.按任意顺序返回结果表。
Example 1:
Input:
Products table:
+-------------+----------+------------+
| product_id | low_fats | recyclable |
+-------------+----------+------------+
| 0 | Y | N |
| 1 | Y | Y |
| 2 | N | Y |
| 3 | Y | Y |
| 4 | N | N |
+-------------+----------+------------+
Output:
+-------------+
| product_id |
+-------------+
| 1 |
| 3 |
+-------------+
Explanation: Only products 1 and 3 are both low fat and recyclable.Code1:
import pandas as pd
def find_products(products: pd.DataFrame) -> pd.DataFrame:
products_new = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')]
return products_new[['product_id']]Customers Who Never Order - 从不订购的客户 (重名, 筛选)
Write a solution to find all customers who never order anything.写一个解决方案,找出所有从不订购任何东西的客户。
Return the result table in any order.按任意顺序返回结果表。
Approach 方法
使用
isin()方法确定客户DataFrame中的每个id是否存在于订单DataFrame中。
应用否定运算符
(~)选择 ID 不在订单列表中的客户。
筛选后,只选择客户
DataFrame中的姓名列
将列名称重命名为 "客户",使结果更加清晰。
Input:
Customers table:
+----+-------+
| id | name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
Output:
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+Code1:
import pandas as pd
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
filtered_customers = customers[~customers['id'].isin(orders['customerId'])]
result = filtered_customers[['name']].rename(columns={'name': 'Customers'})
return resultArticle View I - 文章观点 I (排序, 合并)
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |
+---------------+---------+
该表的每一行都表示某位读者在某日浏览了某篇文章(由某位作者撰写)。
请注意, author_id 和 viewer_id 表示同一个人。写一个解决方案,找出所有至少浏览过一篇自己文章的作者 返回按id升序排序的结果表 结果格式如下
Example 1:
Input:
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
Output:
+------+
| id |
+------+
| 4 |
| 7 |
+------+Code1:
import pandas as pd
def article_views(views: pd.DataFrame) -> pd.DataFrame:
# 筛选出 author_id 和 viewer_id 相等的行
author_viewed_own_articles = views[views['author_id'] == views['viewer_id']]
# 提取唯一作者:从上表中提取唯一的 author_id
unique_author = author_viewed_own_articles['author_id'].unique()
# 排序:对唯一作者 ID 进行排序
unique_author = sorted(unique_author)
# 创建并返回结果
result_df = pd.DataFrame({'id': unique_author})
return result_dfInvalid Tweets - 无效推文 (过滤字符长度)
Table: Tweets
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| tweet_id | int |
| content | varchar |
+----------------+---------+
tweet_id is the primary key (column with unique values) for this table.
This table contains all the tweets in a social media app.
1. 编写一个解决方案,找出无效推文的 ID。如果推文内容中使用的字符数严格大于 15,则该推文无效
2. 按任意顺序返回结果表
3. 结果格式如下Example 1:
Input:
Tweets table:
+----------+----------------------------------+
| tweet_id | content |
+----------+----------------------------------+
| 1 | Vote for Biden |
| 2 | Let us make America great again! |
+----------+----------------------------------+
Output:
+----------+
| tweet_id |
+----------+
| 2 |
+----------+
Explanation:
Tweet 1 has length = 14. It is a valid tweet.
Tweet 2 has length = 32. It is an invalid tweet.import pandas as pd
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
invalid_tweets_df = tweets[tweets['content'].str.len() > 15]
retult_df = invalid_tweets_df[['tweet_id']]
return retult_dfCalculate Special Bonus - 计算特殊奖金(排序, 条件计算)
.sort_values(by='employee_id', ascending=True):
sort_values()是一个用于排序 DataFrame 的 Pandas 函数。by='employee_id'指定了要根据哪一列的值进行排序,在这个例子中是employee_id列。ascending=True表示按升序排序。升序意味着从小到大排列,如果设为False,则表示降序排列。
.str.startswith('M')
Table: Employees
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| employee_id | int |
| name | varchar |
| salary | int |
+-------------+---------+
employee_id 是该表的主键(具有唯一值的列)。
该表的每一行都注明了员工 ID、员工姓名和工资。请写出计算每位员工奖金的解决方案。如果员工的 ID 是奇数,且员工姓名不是以 "M " 开头 ,则该员工的 奖金为其工资的100%。否则奖金为0
返回按employee_id 排序的结果表
结果格式如下
Example 1:
Input:
Employees table:
+-------------+---------+--------+
| employee_id | name | salary |
+-------------+---------+--------+
| 2 | Meir | 3000 |
| 3 | Michael | 3800 |
| 7 | Addilyn | 7400 |
| 8 | Juan | 6100 |
| 9 | Kannon | 7700 |
+-------------+---------+--------+
Output:
+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2 | 0 |
| 3 | 0 |
| 7 | 7400 |
| 8 | 0 |
| 9 | 7700 |
+-------------+-------+
Explanation:
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with 'M'.
The rest of the employees get a 100% bonus.import pandas as pd
def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
# Create a new column 'bonus' with default value 0
# 创建一个新列,其默认值为0
employees['bonus'] = 0
# Calculate bonus based on the conditions
# 基于条件计算奖金
employees.loc[(employees['employee_id'] % 2 != 0) & (~employees['name'].str.startswith('M')), 'bonus'] = employees['salary']
# Select only the required columns and sort the result table by employee_id in ascending order
# 只选择所需列,并按 employee_id 升序对结果表排序
result_df = employees[['employee_id', 'bonus']].sort_values(by='employee_id', ascending=True)
return result_dfFix Names in a Table - 修复表中的名称 (排序, 大小写转换)
Table: Users
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| name | varchar |
+----------------+---------+
# user_id 是该表的主键(具有唯一值的列)。
# 该表包含用户的 ID 和名称。名称只能由小写和大写字母组成
# 请写出一个解决方案,修正名称,使第一个字符为大写,其余为小写。
# 返回按user_id 排序的结果表
Example 1:
Input:
Users table:
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | aLice |
| 2 | bOB |
+---------+-------+
Output:
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | Alice |
| 2 | Bob |
+---------+-------+Code1:
import pandas as pd
def fix_names(users: pd.DataFrame) -> pd.DataFrame:
# .str.capitalize(): pandas 提供的字符串操作方法。它会将 name 列的每个字符串的第一个字符变为大写,剩余字符变为小写
users['name'] = users['name'].str.capitalize()
# 按照 user_id 列升序排序
result_df = users.sort_values(by = 'user_id', ascending=True)Find Users With Valid E-Mails - 查找具有有效电子邮件的用户 (正则条件筛选)
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
| mail | varchar |
+---------------+---------+
# user_id 是该表的主键(具有唯一值的列)。
# 本表包含网站注册用户的信息。部分电子邮件无效。
# 找出拥有有效电子邮件的用户
# 一个有效的电子邮件有一个前缀名和一个域:
# 前缀名称是一个字符串,可包含字母(大写或小写)、数字、下划线"_"、句号" . "和/或破折号"-" 。前缀名称必须以字母开头。
# 域名为"@leetcode.com"
Example 1:
Input:
Users table:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | [email protected] |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | [email protected] |
| 4 | Sally | [email protected] |
| 5 | Marwan | quarz#[email protected] |
| 6 | David | [email protected] |
| 7 | Shapiro | [email protected] |
+---------+-----------+-------------------------+
Output:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | [email protected] |
| 3 | Annabelle | [email protected] |
| 4 | Sally | [email protected] |
+---------+-----------+-------------------------+import pandas as pd
'''
contains(): 这是 pandas 提供的一个字符串方法,用于检查
每个元素(字符串)是否包含指定的模式或子串。它返回一个布尔
值的 Series,其中每个元素为 True 或 False,表示相应的字
符串是否匹配
'''
def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
# 定义正则表达式
regex_pattern = r'^[a-zA-Z][a-zA-Z0-9._-]*@leetcode\.com$'
# 过滤符合条件的行
filtered_user = users[users['mail'].str.contains(regex_pattern, regex=True)]
return filtered_userPatients With a Condition - 患有某种疾病的患者 (正则条件筛选)
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| patient_id | int |
| patient_name | varchar |
| conditions | varchar |
+--------------+---------+
patient_id 是该表的主键(具有唯一值的列)
包含 0 个或多个用空格分隔的代码。
该表包含医院病人的信息
请写出一个解决方案,以查找 I 型糖尿病患者的 patient_id、patient_name 和病情。
I 型糖尿病总是以DIAB1前缀开头
按任意顺序返回结果表
Example 1:
Input:
Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 1 | Daniel | YFEV COUGH |
| 2 | Alice | |
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
| 5 | Alain | DIAB201 |
+------------+--------------+--------------+
Output:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
+------------+--------------+--------------+import pandas as pd
def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
patients_df = patients[patients['conditions'].str.contains(r'\bDIAB1')]
return patients_df
Last updated
Was this helpful?