하루에 하나씩

[SQL] (4Week) Modifying and Analyzing Data with SQL 본문

카테고리 없음

[SQL] (4Week) Modifying and Analyzing Data with SQL

JY SHIN 2022. 4. 26. 08:34

Methods for Modifying Data

  • Concatenating
  • Trimming
  • Changing case
  • Substring functions
  • Date and time strings

 

Data Governance and Profiling

  • Tips and tricks for using SQL for data science
  • Putting it all together

 

1. Working with Text Strings

 

- Concatenate or combine text strings

- Trim text strings

- Use the substring function 

- Change the case of your strings

 

Working with string Variables

- Retrieve the data in the format you need: Client vs server formatting

- Support Joins

- String Functions: Concatenate, Substring, Trim, Upper, Lower

 

Concatenations

Note: SQL server supports  + instead of ||

Trimming Strings

Trims the leading or trailing space from a string

  • TRIM
  • RTRIM
  • LTRIM

 

Substring 

Returns the specified number of characters from a particular position of a given string

''

substring example

Upper and Lower

2. Working with Date and Time Strings

- Describe the complexities of adjusting date and time strings

- Discuss the different formats in which dates and times are presented

- List and describe the 5 different functions in SQL that can be used to manipulate date and time strings

Date Formats

  • DATE: YYYY-MM-DD
  • DATETIME: YYYY-MM-DD HH:MI:SS
  • TIMESTAMP YYYY-MM-DD HH:MI: SS

 

If you query a DATETIME with :

You will get no results

 

SQLite Date Time Functions 

SQLite supports 5 date and time functions: 

Timestrings

A time string can be in any of the following formats

 

Modifiers

3. Date and Time String Examples

STRFTIME( , ) AS 

 

Compute Current Date

 

Compute Year, Month and Day for the Current Date

 

Compute the Hour, Minute and Second and Milliseconds from Current DATETIME

Compute Age Using Birthdate

4. Case statement

What is a Case Statement

Mimics if-then-else statement found in most programming languages

Can be used in SELECT, INSERT, UPDATE, and DELETE statements

 

Search Case Statement

 

 

5. Views

- A stored query

- Can add or remove columns without changing the schema

- Use it to encapsulate queries

- The view will be removed after the database connection has ended

 

Instead of creating a whole new table, sometimes we can create the illusion of a table by using a view

A view is essentially a stored query, and it helps us clean up our queries and simplify when we have to write.

In a view, you can add or remove columns without changing the schema. You're not actually writing the query to the database or anything, what you're doing is you are kind of storing it for the time being. This is really helpful or pays off when we use it to encapsulate queries. The syntax for this is you're just going to CREATE and you can either specify a temporary view or just create a view. You can also add in IF NOT EXISTS. So if it doesn't already exist, then you have the view name and you state what the conditions you want to go into the view.

 

It's only stored for the duration of a session. If you come back the next day and start a new session and you're having errors in your queries, it's probably because didn't crate the view again. 

 

AS is really the select statement. 

6. Data Governance and Profiling

Learning Objectives

- Define data governance and profiling

- Explain the importance of data governance and profiling your data appropriately

- Discuss methods of profiling your data

 

What is Data Profiling?

- Looking at descriptive statistics or object data information - examining data for completeness and accuracy

- Important to understand your data before you query it

 

Object Data Profile

- Number of rows

- Table size

- When the object was last updated

 

Column Data Profile

- Column data type

- Number of distinct values

- Number of rows with NULL values

- Descriptive statistics: maximum, average, and standard deviation for column values

 

Governance Best Practices

- Understand your read and write capabilities

- Clean up your environments

- Understand your promotion process

 

7. Using SQL for Data Science, Part1

Learning Objectives

- Discuss the importance of understanding your data when starting a new problem

- Discuss the importance of  understanding business needs before beginning data analysis

 

Working Through a Problem from Begining to End

  • Data Understanding 
  • Business understanding 
  • Profiling 
  • Start with SELECT
  • Test
  • Format & Comment
  • Review

 

Data Understanding

  • Most important step
  • Understanding relationships in your data
  • NULLvalues
  • String values
  • Dates and time

 

Subject Area Understanding

  • Until you gain business understanding, writing queries may take more time
  • Understanding where data joins are
  • Differentiating integers from strings
  • investing time to understand your subject will help later during data analysis

Business Understanding

Ask questions about the business problem you are solving

Hard to separate data and business understanding 

 

Beware of the Unspoken Need

"We want to predict whether or not a customer is likely to buy our product."

  • Which customers?
  • What product?
  • What is/should be excluded?
  • What is/should be counted from the past?

Moving Between Data and Business Understanding

 

8. Using SQL for Data Science, Part2

Learning Objectives

- Determine and map out data elements needed for a query

- Discuss strategies to use to write complex queries

- Explain common troubleshooting techniques

 

Profiling Data

  • Get into the details of your data
  • Create a data model and map the fields and tables you need
  • Consider joins and calculations
  • Understand any data quality or format issues

 

Start with SELECT

  • Start simple
  • Any query begins with SELECT statement
  • Add in special formatting
  • If using subqueries start with the innermost query and work outward 

 

Test and Trouble Shooting 

- Do not wait until the end to test queries

- Test after each join or filter

- Are you getting the results you expect?

- Start small and go step-by-step when troubleshooting a query

 

Format and Comment

- Use correct formatting and indentation

- Commet strategically

- Clean code and comments help when you revisit or hand off cide

 

Review

- Always review old queries

- Business rules

- Date changes

- Date Indicators

- Work the problem for beginning to end


Additional SQL Resources to Explore

Haven’t had enough SQL? Or do you want to keep practicing to improve your skills? SQL puzzles are a great way to do this! Below is one of my favorite resources for practicing SQL Puzzles. I use it all the time.

SQL Authority: SQL Puzzles

In addition, many of you may be taking this class with a desire to get a new job or position. Below is a resource that includes quizzes and is recommended by many recruiters to practice SQL for a data science interview.

SQLZOO

Comments