일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
31 |
- 빅데이터분석기사
- 태블로신병훈련소
- CourseraSQL
- 태블로 자격시험 독학
- 태블로 신병훈련소 후기
- 태블로 신병 훈련소 11기
- 태블로 신병훈련소
- 태블로 집합
- 태블로독학
- 태블로 씹어먹기
- TABLEAU Certificate
- 태블로초보
- 알약 모양
- 태블로 독학
- 태블로 리뷰
- 태블로 신병 훈련소 후기
- tableau
- 태블로 신병 훈련소
- 태블로
- SQL
- 태블로 자격증 독학
- python udemy
- Python
- 태블로 신병 훈련소 10기
- 태블로 무료 강의
- 범프차트
- 태블로 데스크탑
- 태블로 자격시험
- 데이터 시각화
- coursera
- Today
- Total
하루에 하나씩
[SQL] (2Week) Filtering, Sorting, and Calculating Data with SQL 본문
1. Basics of Filtering with SQL
Why Filter?
Be specific about the data you want to retrieve
Reduce the number of records you retrieve
Increase query performance
Reduce the strain on the client application
Governance limitations
Which of the following did I call “huge benefits” of filtering directly with SQL instead of relying on the client application to do it? Which of the following did I call “huge benefits” of filtering directly with SQL instead of relying on the client application to do it? < a, c, d >
a. It reduces the amount of data you’re analyzing and that speeds up the query performance.
b. It increases the strain on the client-side of the application, but it’s worth it because of the greater specificity in the results.
c. It will often reduce the number of records we're retrieving.
d. It ensures you're getting the data that you want and need.
WHERE Clause Operators
Filtering on a Single Condition
Filtering on a Single value
Checking for Non-Matches
Filtering with a Range of Values
Filtering No Value
No value returned because there are no null values in the product name
- A null is very different than having a price that is zero.
- A null means that there is actually no data in this column.
EX. Does the column representing the product names have some type of information for every record?
2. Advanced Filtering: IN, OR, and NOT
IN Operator
Specifies a range of conditions
Comma delimited list of values
Enclosed in ()
OR Operator
DBMS will not evaluate the second conditions in a WHERE clause if the first condition is met.
Use for any row matching the specific conditions
IN vs. OR
In works the same as OR
Benefits of IN
- Long list of options
- IN executes faster than OR
- Don't have to think about the order with IN
- Can contain another SELECT
Q. Sometimes IN and OR can accomplish the same thing; however, generally speaking which of the following are the benefits of using IN? (a, b, c)
a. IN executes faster than OR
b. With IN you don't have to think about the order in which you're placing your different conditions.
c. Using IN, we can use another SELECT for subqueries
d. IN gives you fewer options in how many things you can list but they are more targeted.
OR with AND
SQL processes the OR before the AND. And so one of the ways to limit this is to use parentheses.
Order of Operations
Can contain AND and OR Operators
SQL processes AND before OR
Use()
* Don't rely on the default order of operations. You are better being specific and getting in the habit of using the ()
NOT Operator
3. Using Wildcards in SQL
What are WildCards?
Special character used to match parts of a value
Search pattern made from literal text, wildcard character, or a combination
Uses LIKE as an operator (though technically a predicate)
Can only be used with strings
Cannot be used for non-text data types
Helpful of r data scientists as they explore strings variables
Using % Wildcards
- % wildcard will not match NULLs
- NULL represents no value in a column
Underscore(_) Wild card
Matches a single character
Is not supported by DB2
These produce the same results
Bracket [] Wildcard
Used to specify a set of characters in a specific location
Does not work with all DBMS
Does not work with SQLite
Downsides of Wildcards
Takes longer to run
Better to use another operator(if possible): =, <, => and etc.
Statements with wildcards will take longer to run if used at the end of search patterns
Placement of wildcards is important
Q. Which of the following describe the “downside” of wildcards mentioned in this lesson (select all that apply)? (a, c)
a. You must take into account that there are different ways to use wildcards in different database management systems. So that requires learning what works according to the database management system you are using.
b. Wildcards have to follow a certain order and so can be confusing when you are just starting to learn to use them.
c. Queries using wildcards take a little bit longer to run
d. Wildcards find a wider range of things, such as with a phrase that ends in something or starts with something.
4. Sorting with ORDER BY
Why Sort Data?
Data displayed appears in the order of the underlying tables
Updated and deleted data can change this order
Sequence of retrieved data cannot be assumed if order was not specified
Sorting data logically helps keep the information you on top
ORDER BY clause allows user to sort data by particular columns
Rules of ORDER BY
Takes the name of one or more columns
Add a comma after each additional column name
Can sort by a column not retrieved
Must always be the last clause in a select statement
Sorting by Column Position
2 means 2nd column
3 means 3rd column, etc.
Sort Direction
DESC descending order
ASC ascending order
Only applies to the column names it directly precedes
5. Math Operations
Multiplication Example
Total units on order multiplied by the unit price to calculate the total order cost
Order of Operations
Parentheses
Exponents
Multiplication
Division
Addition
Subtraction
"Please Excuse My Dear Aunt Sally"
Combining Math Operations
Unit price, minus my discount, divided by the quantity
6. Aggregate Functions
What are Aggregate Functions?
Used to summarize data
Finding the highest and lowest vales
Finding the total number of rows
Finding the average value
Aggregate Functions
AVERAGE Function
Rows containing NULL values are ignored by the AVERAGE function
COUNT Function
COUNT(*) - Counts all the rows in a table containing values or NULL values
COUNT(column) - Counts all the rows in a specific column ignoring NULL values
MAX and MIN Functions
NULL values are ignored by the MIN and MAX functions
SUM Aggregate Function
Using DISTINCT on Aggregate Functions
If DISTINCT is not specified, ALL is assumed
Cannot use DISTINCT on COUNT(*)
No value to use with MIN and MAX functions
7. Grouping Data with SQL
Grouping Example
Counts customers after group on region rather than counting the whole table
Addtional GROUP BY Informaiton
GROUP BY clauses can contain multiple coumns
Every column in your SELECT statement must be present in a GROUP BY clause, except for aggregated calculations
NULLS will be grouped together if your GROUP BY column contains NULLs
HAVING Clause - Filtering for Groups
WHERE does not work for groups
WHERE filters on rows
Instead use HAVING clauses to filter for groups
Grouping Example
WHERE vs. HAVING
WHERE filters before data is grouped
HAVING filters after data is grouped
Rows eliminated by the WHERE clause will not be a included in the group
ORDER BY with GROUP BY
ORDER BY sorts data
GROUP BY does not sort data
8. Putting It All Together
Filtering is Useful
Narrowing down your results
Increasing query & application performance
Understanding your data:
- Finding specific values
- Finding a range of values
- Finding blank values
Key SQL Clauses
9. SQL for Various Data Science Languages
In this class we’ve gone over relational databases and how SQL is used to retrieve data from them. However, because of the popularity and versatility of SQL, SQL is also used for many big data applications. Below are a few resources for how SQL is used with common big data and data science languages.
SQL for R
SQL for Spark
SQL with Hadoop
SQL for Python
'SQL' 카테고리의 다른 글
[SQL] (3Week) Subqueries and Joins in SQL (0) | 2022.04.24 |
---|---|
[SQL] (1Week) Selecting and Retrieving Data with SQL (0) | 2022.04.13 |