하루에 하나씩

[SQL] (2Week) Filtering, Sorting, and Calculating Data with SQL 본문

SQL

[SQL] (2Week) Filtering, Sorting, and Calculating Data with SQL

JY SHIN 2022. 4. 17. 19:12

1. Basics of Filtering with SQL

Why Filter?

Be specific about the data you want to retrieve

Reduce the number of records you retrieve

Increase query performance

Reduce the strain on the client application

Governance limitations

 

Which of the following did I call “huge benefits” of filtering directly with SQL instead of relying on the client application to do it?  Which of the following did I call “huge benefits” of filtering directly with SQL instead of relying on the client application to do it? < a, c, d > 

a. It reduces the amount of data you’re analyzing and that speeds up the query performance.

b. It increases the strain on the client-side of the application, but it’s worth it because of the greater specificity in the results.

c. It will often reduce the number of records we're retrieving.

d. It ensures you're getting the data that you want and need.

 

WHERE Clause Operators

 

Filtering on a Single Condition

 

Filtering on a Single value

Checking for Non-Matches

Filtering with a Range of Values

Filtering No Value

No value returned because there are no null values in the product name

- A null is very different than having a price that is zero.

- A null means that there is actually no data in this column. 

EX. Does the column representing the product names have some type of information for every record?

2. Advanced Filtering: IN, OR, and NOT

IN Operator

Specifies a range of conditions

Comma delimited list of values

Enclosed in ()

OR Operator

DBMS will not evaluate the second conditions in a WHERE clause if the first condition is met. 

Use for any row matching the specific conditions

IN vs. OR

In works the same as OR

Benefits of IN

  • Long list of options
  • IN executes faster than OR
  • Don't have to think about the order with IN
  • Can contain another SELECT

 

Q. Sometimes IN and OR can accomplish the same thing; however, generally speaking which of the following are the benefits of using IN?    (a, b, c)

a. IN executes faster than OR
b. With IN you don't have to think about the order in which you're placing your different conditions.
c. Using IN, we can use another SELECT for subqueries
d. IN gives you fewer options in how many things you can list but they are more targeted.

 

OR with AND

SQL processes the OR before the AND. And so one of the ways to limit this is to use parentheses.

 

Order of Operations 

Can contain AND and OR Operators

SQL processes AND before OR

Use()

* Don't rely on the default order of operations. You are better being specific and getting in the habit of using the ()

 

NOT Operator 

3. Using Wildcards in SQL 

What are WildCards?

Special character used to match parts of a value

Search pattern made from literal text, wildcard character, or a combination

Uses LIKE as an operator (though technically a predicate)

Can only be used with strings

Cannot be used for non-text data types

Helpful of r data scientists as they explore strings variables

 

Using % Wildcards

- % wildcard will not match NULLs

- NULL represents no value in a column

 

Underscore(_) Wild card

Matches a single character

Is not supported by DB2 

These produce the same results

 

Bracket [] Wildcard

Used to specify a set of characters in a specific location

Does not work with all DBMS

Does not work with SQLite

 

Downsides of Wildcards

Takes longer to run

Better to use another operator(if possible): =, <, => and etc.

Statements with wildcards will take longer to run if used at the end of search patterns

Placement of wildcards is important

 

 

Q. Which of the following describe the “downside” of wildcards mentioned in this lesson (select all that apply)?   (a, c)
a. You must take into account that there are different ways to use wildcards in different database management
systems. So that requires learning what works according to the database management system you are using.
b. Wildcards have to follow a certain order and so can be confusing when you are just starting to learn to use them.
c. Queries using wildcards take a little bit longer to run
d. Wildcards find a wider range of things, such as with a phrase that ends in something or starts with something.    

 

4. Sorting with ORDER BY 

Why Sort Data?

Data displayed appears in the order of the underlying tables

Updated and deleted data can change this order

Sequence of retrieved data cannot be assumed if order was not specified

Sorting data logically helps keep the information you on top

ORDER BY clause allows user to sort data by particular columns

 

Rules of ORDER BY 

Takes the name of one or more columns

Add a comma after each additional column name

Can sort by a column not retrieved

Must always be the last clause in a select statement

 

Sorting by Column Position

2 means 2nd column

3 means 3rd column, etc. 

 

Sort Direction

DESC descending order

ASC ascending order

Only applies to the column names it directly precedes

 

5. Math Operations

Multiplication Example

Total units on order multiplied by the unit price to calculate the total order cost

 

Order of Operations

Parentheses

Exponents

Multiplication

Division

Addition

Subtraction

"Please Excuse My Dear Aunt Sally"

 

Combining Math Operations

Unit price, minus my discount, divided by the quantity

6. Aggregate Functions

What are Aggregate Functions?

Used to summarize data

Finding the highest and lowest vales

Finding the total number of rows

Finding the average value

 

Aggregate Functions

AVERAGE Function

Rows containing NULL values are ignored by the AVERAGE function

 

COUNT Function

COUNT(*) - Counts all the rows in a table containing values or NULL values

COUNT(column) - Counts all the rows in a specific column ignoring NULL values

MAX and MIN Functions

NULL values are ignored by the MIN and MAX functions

 

SUM Aggregate Function

Using DISTINCT on Aggregate Functions 

If DISTINCT is not specified, ALL is assumed

Cannot use DISTINCT on COUNT(*)

No value to use with MIN and MAX functions

 

7. Grouping Data with SQL

Grouping Example

Counts customers after group on region rather than counting the whole table

Addtional GROUP BY Informaiton

GROUP BY clauses can contain multiple coumns

Every column in your SELECT statement must be present in a GROUP BY clause, except for aggregated calculations

NULLS will be grouped together if your GROUP BY column contains NULLs

 

HAVING Clause - Filtering for Groups

WHERE does not work for groups

WHERE filters on rows

Instead use HAVING clauses to filter for groups

 

Grouping Example

 

WHERE vs. HAVING

WHERE filters before data is grouped

HAVING filters after data is grouped

Rows eliminated by the WHERE clause will not be a included in the group

 

 

ORDER BY with GROUP BY 

ORDER BY sorts data

GROUP BY does not sort data

 

 

8. Putting It All Together

Filtering is Useful 

Narrowing down your results

Increasing query & application performance

Understanding your data:

  • Finding specific values
  • Finding a range of values
  • Finding blank values

Key SQL Clauses

 

9. SQL for Various Data Science Languages

 

In this class we’ve gone over relational databases and how SQL is used to retrieve data from them. However, because of the popularity and versatility of SQL, SQL is also used for many big data applications. Below are a few resources for how SQL is used with common big data and data science languages.

SQL for R

SQL for Spark

SQL with Hadoop

SQL for Python

'SQL' 카테고리의 다른 글

[SQL] (3Week) Subqueries and Joins in SQL  (0) 2022.04.24
[SQL] (1Week) Selecting and Retrieving Data with SQL  (0) 2022.04.13
Comments