# Exercises

## CRUD

### insert() exercises

#### Exercise

* Insert one document to pypeople collection
  * \_id:0, name: Jisu, age: 24
* 결과

```python
{'_id': 0, 'age': 24, 'name': 'Jisu’}
```

* Insert multiple document
  * \_id: 1, name: Deayoung, age: 51
  * \_id: 2, name: Youngjin, age: 24
  * Print all documents
* 결과

```python
 [{'_id': 0, 'age': 24, 'name': 'Jisu’},
  {'_id': 1, 'age': 51, 'name': 'Deayoung’},
  {'_id': 2, 'age': 24, 'name': 'Youngjin'}]
```

### find() Exercises

#### Exercise 1

#### Preliminary

* Import `people.json` into `lab4.people` (i.e. database name is `lab4` and collection name is `people`)

```bash
$ mongoimport -d lab4 -c people people.json
```

#### Questions

* Find the documents whose age under 22.
* Find the documents whose age is between 22 and 26 (inclusive).
* Find the document whose name is “Lee” suppressing the profile.
* Find the documents whose name is not Lee.
* Find the documents whose name is “Lee” or “Park”.
* Find the documents whose name is not “Kim” or “Park” or “Lee”.

### Exercise 2

#### Preliminary

* Import `inventory.json` into lab4.inventory

#### Questions

* Find documents where the *tags* field contains below elements.
  * appliance, school, and book.
  * Hint: recall `$all` operator
* Find all documents but select at most three items in the *tags* field.
  * Hint: recall `$slice` operator
* Find all documents but select items from the second to the fourth in the *tags* field.
  * Hint: recall `$slice` operator with `[skip, limit]`
* Find documents that have brown color in the *qty* field.
  * Hint: recall `$elemMatch` operator

### Exercise 3

#### Preliminary

* Import `store.json` to the store collection

#### Questions

* Find documents where *buyer*'s *name* is "J.S. Lee"
  * Hint: Querying embedded document.
* Print purchased *items* where a *name* in a document is "iPhone Xs"
  * Hint: Querying embedded document, Projection
* Print the buyer’s purchase method where *items* contain the document that price is more than 2000$
  * Hint: < array.field >, Projection
* Find documents that don’t have carrier as key in attribute where the name is "Apple Watch Series 4"
  * Hint: `$elemMatch`, `$exists`

### **Exercise 4**

#### **Preliminary**

* Import `store.json` to the store collection
  * Same with Exercise 3!

#### Questions

* Find documents where buyer's name is "J.S. Lee"
  * Hint: Querying embedded document.&#x20;
* Print purchased items where a name in a document is "iPhone Xs".&#x20;
  * Hint: Querying embedded document, Projection&#x20;
* Print the buyer’s purchase method where items contain the document that price is more than 2000$.&#x20;
  * Hint: `<array.field>`, Projection&#x20;
* Find documents that don’t have carrier as key in attribute where the name is "Apple Watch Series 4"&#x20;
  * Hint: `$elemMatch`, `$exists`

### **Exercise 5**

#### Preliminary

* Import `store.json` to the store collection
  * Same with Exercise 3!

#### Questions

* Get the first three documents and print buyers field only.
* Skip a document and then print two documents
* Count documents in the store collection.

### **Exercise 6**

#### Preliminary

* Using `grade.json` (import it using mongoimport)

#### Questions

* The grade’s document can have among three types:
  * quiz, exam or homework
* Student id (sid) is from 0 to 99
* A document is lost, which contains a quiz score\
  1\. print lost document’s sid.\
  2\. Insert the quiz score (80) to the corresponding sid\
  3\. print the three students that have the highest quiz score.
* A simple example of the dataset looks like ...

```javascript
{"sid": 0, "type": "homework", "score": 69}
{"sid": 0, "type": "quiz" , "score": 29}
{"sid": 0, "type": "exam", "score": 86}
{"sid": 1, "type": "homework", "score": 2}
{"sid": 1, "type": "quiz" , "score": 6}
{"sid": 1, "type": "exam", "score": 67}
{"sid": 2, "type": "homework", "score": 13}
{"sid": 2, "type": "quiz" , "score": 91}
{"sid": 2, "type": "exam", "score": 76}
...
```

### **Exercise 7**

#### Preliminary

* Using `grade.json` (import it using mongoimport)
  * Same with exercise 6

#### Questions

* Write the score searching script
  * Search documents by filtering *type* and *score* fields

#### Results

```
search > homework 13
sid homework
2 13
74 13
```

```
search > quiz 41-49
sid quiz
10 46
17 48
19 44
...
96 49
98 41
```

### update() Exercise

Set a 'location' field to 'Seoul' to a document.

* The document has a 'name‘, 'Deayoung‘.&#x20;

```python
[{'_id': 0, 'age': 24, 'name': 'Jisu'},
 {'_id': 1, 'age': 51, 'location': 'Seoul', 'name': 'De ayoung'},
 {'_id': 2, 'age': 24, 'name': 'Youngjin'}] 
```

* Set a 'location' field to 'Suwon' to documents.
  * The documents have a 'age‘, 24.

```python
[{'_id': 0, 'age': 24, 'location': 'Suwon', 'name': 'Jisu'},
 {'_id': 1, 'age': 51, 'location': 'Seoul', 'name': 'De ayoung'},
 {'_id': 2, 'age': 24, 'location': 'Suwon', 'name': 'Y oungjin'}]
```

### delete() exercise

Delete documents

* * Remove documents that have a 'location‘, 'Suwon‘.
* Print all documents

```
{'_id': 1, 'age': 51, 'location': 'Seoul', 'name': 'Deayoung'}
```

​

### explain() Exercise

* Print a query plan using executionStats corresponding to the query that finds writer, Kim.
* Using blog.json

```python
 "executionStats" : {
…
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 4,
                "nReturned": 1,
                "executionStages": {
                               "stage": "COLLSCAN",
…
```

## CRUD(Advanced)

* Use a ex1 database and imdb collection.

* We are going to make a movie database like IMDb.

* Each movie has following information.

  * title: String
  * director: String
  * genre: List(String)
  * score: Double

* **Insert a movie**
  * Write a script that inserts movie information to a movie collection.&#x20;
  * Type the movie information as follows:
  * > Movie Title: **Avengers: Infinity War**
    >
    > Director: **Anthony Russo and Joe Russo**
    >
    > Genre: **Action, Adventure, Drama**
    >
    > score: **8.6**
    >
    > \[INFO] Success to insert!
  * Plain: Terminal output&#x20;
  * Bold: User input&#x20;
  * *Hint: use a input() function*

* Confirm that the insertion is operated properly.&#x20;

* Print only movie’s title(s)

```python
[INFO] Movie list
Avengers: Infinity War
```

* Insert three movies more. Use **insert\_many**
  * > Movie Title: Bohemian Rhapsody
    >
    > Director: Bryan Singer
    >
    > Genre: Biography, Drama, Music
    >
    > score: 8.1
    >
    > Quit (Y/n)? n
  * > Movie Title: Happy Death Day 2U&#x20;
    >
    > Director: Christopher Landon
    >
    > Genre:  Drama, Horror, Mystery
    >
    > score: 6.6
    >
    > Quit (Y/n)? n
  * > Movie Title: Spider-Man: Into the Spider-Verse&#x20;
    >
    > Director:  Bob Persichetti, Peter Ramsey
    >
    > Genre:  Animation, Action, Adventure
    >
    > score: 8.6
    >
    > Quit (Y/n)? Y
    >
    > \[INFO] Success to insert!
* Confirm that the insertion is operated properly.
* Print only movie’s title(s)

```python
[INFO] Movie list
Avengers: Infinity War
Bohemian Rhapsody
Happy Death Day 2U
Spider-Man: Into the Spider-Verse 
```

* **Update** movie information
  * Querying movie title and update the information
  * > \[INFO] Enter the information which you want to edit
    >
    > Movie Title: **Avengers: Infinity War**
    >
    > Director: *Anthony Russo and Joe Russo*
    >
    > Genre: *Action, Adventure, Drama*
    >
    > score: *8.6*
    >
    > \[INFO] Success to update!
  * Plain: Terminal output
  * Bold: query
  * Italic: update
* Print all documents

```python
[INFO] Movie list
[{'_id': ObjectId('5c76acb565d1016cd33d4a5d'),
  'director': 'Anthony Russo and Joe Russo',
  'genre': ['Action', ' Adventure', ' Drama'],
  'score': 8.6,
  'title': 'Avengers: Infinity War'},
 {'_id': ObjectId('5c76ad0d65d1016cd33d4a5e'),
  'director': 'Bryan Singer',
  'genre': ['Biography', ' Drama', ' Music '],
  'score': 8.1,
  'title': 'Bohemian Rhapsody'},
 {'_id': ObjectId('5c76ad0d65d1016cd33d4a5f'),
  'director': 'Christopher Landon',
  'genre': ['Drama', ' Horror', ' Mystery'],
  'score': 6.6,
  'title': 'Happy Death Day 2U'},
 {'_id': ObjectId('5c76ad0d65d1016cd33d4a60'),
  'director': 'Bob Persichetti, Peter Ramsey',
  'genre': ['Animation', ' Action', ' Adventure'],
  'score': 8.6,
  'title': 'Spider-Man: Into the Spider-Verse'}]
```

* Remove movie (Plain: Output, Bold: User input)
* Remove a document
  * > \[INFO] Enter the title which you want to remove
    >
    > Movie Title: **Happy Death Day 2U**
    >
    > \[INFO] Are you sure? (Y/n) **Y**
    >
    > \[INFO] Deleted
* Result

```python
[INFO] Movie list
Avengers: Infinity War
Bohemian Rhapsody
Spider-Man: Into the Spider-Verse 
```

## Index

### Exercise 1

1. Create an index in ascending order for the writer field and get the index information. :grinning:

   ```python
   {'_id_': {'key': [('_id', 1)], 'ns': 'lab6.blog', 'v': 2},
   'writer_1': {'key': [('writer', 1)], 'ns': 'lab6.blog', 'v': 2}} 
   ```

2. Again, print a query plan using executionStats corresponding to the query that finds a writer, Kim.

   ```python
    "executionStats" : {
   …
                   "executionTimeMillis" : 0,
                   "totalKeysExamined" : 1,
                   "totalDocsExamined" : 1,
                   "nReturned": 1,
                   "executionStages": {
                                 "inputStage": {
                                            "stage": "IXSCAN",
   …
   ```

3. Create the compound index on id in descending order and writer in ascending order.

   ```python
   {'_id_': {'key': [('_id', 1)], 'ns': 'lab6.blog', 'v': 2}, 
   'id_-1_writer_1': {'key': [('id', -1), ('writer', 1)], 'ns': 'lab6.blog', 'v': 2}, 
   'writer_1': {'key': [('writer', 1)], 'ns': 'lab6.blog', 'v': 2}} 
   ```

4. Drop all indexes on the blog collection.

   ```python
   {'_id_': {'key': [('_id', 1)], 'ns': 'lab6.blog', 'v': 2}}
   ```

### Exercise 2

* Index 생성

  * Create a unique index to doc\_id in ascending order
  * Try to build a unique index to line\_num
  * Create a sparse index to intersect in ascending order
  * Create a unique and sparse index to intersect\_id in ascending order
  * Check the index information

* Compare the results below two cases. (metro collection)

  * Show the first two documents, sorting with doc\_id in ascending order.

  * Using intersect index by hint(), show the first two documents

  * Sort doc\_id in ascending order

  * Use explain() above each case (executionStats).

  * No Index

    ```python
     "executionStats" : {
    …
    	 'executionTimeMillis': 155,
    	 'nReturned': 2,
    	 'totalDocsExamined': 186024,
    	 'totalKeysExamined': 0
    	}
    ```

  * Indexed

    ```python
     "executionStats" : {
    …
    	'executionTimeMillis': 21,
    	'nReturned': 2,
    	'totalDocsExamined': 2,
    	'totalKeysExamined': 2
    	}
    ```

## GeoSpatial Index

### Exercise 1

* Create a geospatial index (2dsphere) for the states collection
  * Get information on states collection’s indexes

    ```python
    {'_id_': {'key': [('_id', 1)], 'ns': 'geo.states', 'v': 2}, 
    'loc_2dsphere': 
    {'2dsphereIndexVersion': 3, 'key': [('loc', '2dsphere')], 'ns': 'geo.states', 'v': 2}}
    ```
* Remove the geospatial index from the states collection
  * Get information on states collection’s indexes

    ```python
    {'_id_': {'v': 2, 'key': [('_id', 1)], 'ns': ‘geo.states'}}
    ```

### Exercise 2

* Use the restaurant collection in the geo database

1. Create a valid query and print the query plan using executionStats•Find documents between '10200‘ and '10280‘ in zipcode
2. Create an index to zipcode (Hint: Embedded Document)
3. Execute #1, again.

* &#x20;No Index

  ```python
    "executionStats" : {
   …
                  "executionTimeMillis" : 11,
                  "totalKeysExamined" : 0,
                  "totalDocsExamined" : 25359,
                  "nReturned": 21,
  …
  ```

* Indexed

  ```python
   "executionStats" : {
  …
                  "executionTimeMillis" : 0,
                  "totalKeysExamined" : 21,
                  "totalDocsExamined" : 21,
                  "nReturned": 21,
  …
  ```

* Use the restaurant collection in the geo database

1. Create a query and print the query plan using executionStats•Find documents that have at least the grade is ‘B’ in the grades.grade field.
2. Create an index to grade (Hint: Indexing on Array)
3. Execute #1, again.

* No Index

  ```python
   "executionStats" : {
  …
                  "executionTimeMillis" : 17,
                  "totalKeysExamined" : 0,
                  "totalDocsExamined" : 25359,
                  "nReturned": 8280,
  …
  ```
* Indexed

  ```python
   "executionStats" : {
  …
                  "executionTimeMillis" : 13,
                  "totalKeysExamined" : 8280,
                  "totalDocsExamined" : 8280,
                  "nReturned": 8280,
  …
  ```

### Exercise 3

* Find international airports in the state that has the above zip code (10044)
  * Sorting the name in ascending order&#x20;
  * Print state’s name and airport name each.

    * Hint: states and small\_zip collections

    ```python
    [{'name': 'Albany County'}, 
    {'name': 'Greater Buffalo Intl'}, 
    {'name': 'Greater Rochester International'}, 
    {'name': 'John F Kennedy Intl'}, 
    {'name': 'La Guardia'}, 
    {'name': 'Massena Intl'}, 
    {'name': 'Niagara Falls Intl'}, 
    {'name': 'Ogdensburg Intl'}, 
    {'name': 'Stewart Intl'}, 
    {'name': 'Syracuse Hancock Intl'}, 
    {'name': 'Watertown International'}] 
    ```
* Find Korean restaurants that are located at less 2km from international airports.
  * Warn: should create 2dsphere index to address for restaurants collection.

![](https://2046636869-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Lb7Vg1sUEHzMkgFpUvH%2F-Ldnzdtz8rgVmLD9qhCA%2F-LdoIPpZFkXDmZvkkOms%2Fex.png?alt=media\&token=e9112489-ca4d-4ffd-b0aa-f30e61393486)

* Find states that are intersect to the flight path (trajectory).&#x20;
  * Print names including source and destination states.&#x20;

![](https://2046636869-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Lb7Vg1sUEHzMkgFpUvH%2F-Ldnzdtz8rgVmLD9qhCA%2F-LdoImMimuG7M5dZS3GK%2Fex2.png?alt=media\&token=0dafb620-1a90-41a0-9aee-5a09fe49385a)

* Make Geojson format whose type is LineString.&#x20;

  * Find the two coordinates (LAX, DTW) Result

* Result

  ```python
  [{'name': 'Arizona'},
  {'name': 'California'},
  {'name': 'Colorado'},
  {'name': 'Illinois'},
  {'name': 'Iowa'},
  {'name': 'Kansas'},
  {'name': 'Michigan'},
  {'name': 'Nebraska'}, 
  'name': 'Nevada'},
  {'name': 'Utah'}]
  ```

### Exercise 4

* Using the restaurants collection.

* Find restaurants whose names include "Kimchi".&#x20;

* Print only name, zipcodeand address

  ```python
  { "zipcode" : 11231, "address" : "Smith Street 478" }
  { "zipcode" : 11238, "address" : "Washington Avenue 766" }
  { "zipcode" : 10036, "address" : "West 48 Street 18-20" }
  ```

* Using the restaurants and states collections.

* Find restaurants that have "NolbuRestaurant＂(restaurants collection)&#x20;

* Print the state (states collection)

  ```python
  New York
  ```

* Using restaurants.json&#x20;

* Among the "Korean" restaurants, find a restaurant that has "Ramen"but "Izakaya“.&#x20;

* Then, find a restaurant near above restaurant in 150m.

  ```python
  { '_id' : ObjectId('5c8626739f44c9f6768035da'),
    'address' : {'building' : '8', 'coord': [-73.9916305, 40.7247106],
                 'street': 'Extra Pl', 'zipcode': '10003'},
    'borough': 'Manhattan',
    'cuisine': 'Japanese',
    'grades': [{'date': datetime.datetime(2015, 1, 20, 0, 0),
                 'grade': 'Not Yet Graded',
                 'score': 21}],
    'name': 'Ko Ep, Llc',
    'restaurant_id': '50015854'}
  ```

## Text Index

### Exercise 1

* Using restaurants&#x20;
* Build a text index to the 'name' field.&#x20;
* Find restaurants whose names include 'Kyochon‘ Use text index&#x20;
* Print only borough and name fields

  ```python
  { "borough" : "Queens", "name" : "Kyochon Chicken" }
  { "borough" : "Manhattan", "name" : "Kyochon" }
  ```

### Exercise 2

* Take a look the index information in map and countries collections.

* Find universities in Korea.
  * Get geometry information in the countries collection.&#x20;
  * keyword -properties.admin: “South Korea”&#x20;

* Return ten documents with sorting by “properties.name”, and show the name only.&#x20;

  * Note that there is no ‘properties.name’ in a document; therefore, remove the document.

  ```python
  [{'properties': {'name': 'GIST 광주 과학 기술원'}},
   {'properties': {'name': 'ICT폴리택대학'}},
   {'properties': {'name': 'KAIST 문지 캠퍼스'}},
   {'properties': {'name': 'KC대학교'}},
   {'properties': {'name': 'KT원주연수원'}},
   {'properties': {'name': 'U1대학교 아산캠퍼스'}},
   {'properties': {'name': '가천대학교 글로벌캠퍼스'}},
   {'properties': {'name': '가천대학교 메디컬캠퍼스'}},
   {'properties': {'name': '가톨릭관동대학교'}},
   {'properties': {'name': '가톨릭대학교 성신교정'}}]
  ```

* Search the ‘properties.name’ that includes ‘서울대입구’&#x20;

* Using regular expression ($regex) or a text index.

* Sort properties.name in ascending order and display only a name field as below.

  ```python
  [{'properties': {'name': '서울대입구 삼성아파트'}},
   {'properties': {'name': '서울대입구 풍림아이원 1단지'}},
   {'properties': {'name': '서울대입구 풍림아이원 2단지'}}]
  ```

## Aggregation Framework

### Exercise 1

* Use restaurants.json&#x20;
* Using aggregation framework, print documents below.&#x20;

  1\. Find Korean cuisine&#x20;

  2\. Group by borough and count the cuisines.

  ```python
  [{'_id': 'Brooklyn', 'count': 16},
  {'_id': 'Queens', 'count': 167},
  {'_id': 'Staten Island', 'count': 1},
  {'_id': 'Manhattan', 'count': 78}]
  ```

### Exercise 2

* Using aggregation framework, print documents below.&#x20;

  1\. Find Korean cuisine&#x20;

  2\. Unwind grades&#x20;

  3\. Group by borough and grade in grades, and count the document&#x20;

  4\. Sort by count and print five documents

  ```python
  [{'_id': {'borough': 'Queens', 'grade': 'A'}, 'count': 443},
  {'_id': {'borough': 'Manhattan', 'grade': 'A'}, 'count': 232},
  {'_id': {'borough': 'Queens', 'grade': 'B'}, 'count': 140},
  {'_id': {'borough': 'Manhattan', 'grade': 'B'}, 'count': 49},
  {'_id': {'borough': 'Brooklyn', 'grade': 'A'}, 'count': 38}]
  ```

### Exercise 3

* Find international airports in California using aggregate framework

1. states collection&#x20;
   * Get the California location (Polygon)
2. airports collection&#x20;
   * Find “intl” in name field (text)&#x20;
   * Find airports in the California&#x20;
   * Project name, type and code (w/o \_id)&#x20;
   * Sort by name (ascending order) and code (descending order)

### Exercise 4

* Find Korean restaurants that are located at less 2km from international airports (use the type field).&#x20;

  * Warn: should create 2dsphere index to address for restaurants collection.

  ```python
  [{'_id': ObjectId('5c8626729f44c9f6767fe0af'),
  'address': {'building': '0',
               'coord': 497.9151037865528,
               'street': 'Jfk International Airpor',
               'zipcode': '11430'},
   'borough': 'Queens',
   'cuisine': 'Korean',
   'grades': [{'date': datetime.datetime(2014, 6, 18, 0, 0),
               'grade': 'A',
               'score': 2},
              {'date': datetime.datetime(2014, 1, 14, 0, 0),
               'grade': 'A',
               'score': 12},
              {'date': datetime.datetime(2013, 1, 4, 0, 0),
               'grade': 'A',
               'score': 11},
              {'date': datetime.datetime(2012, 4, 25, 0, 0),
               'grade': 'A',
               'score': 13}],
   'name': 'Korean Lounge',
   'restaurant_id': '40625198'}]
  ```

## Exercise 5

* Print the average, minimum, and maximum of passengers(ride\_pasgr\_numfield) of each subway line.&#x20;
* Hint: group by the ‘line\_num’ field and use $max, $min, and $avg.&#x20;
* Result

  ```python
  [{'_id': '경원선', 'avg_station': 7413.872106666667, 'max_station': 27076, 'min_station': 1},
  {'_id': '일산선', 'avg_station': 11100.640561622466, 'max_station': 25857, 'min_station': 1},
  {'_id': '1호선', 'avg_station': 26936.31375, 'max_station': 70147, 'min_station': 4815},
  ...
  {'_id': '수인선', 'avg_station': 3436.322596153846, 'max_station': 9023, 'min_station': 25},
  {'_id': '경강선', 'avg_station': 2294.190909090909, 'max_station': 9201, 'min_station': 192}]
  ```

## Exercise 6

* Print the total number of passengers (ride\_pasgr\_numfield) during 2017-12-21 \~ 2017-12-28.
* ‘use\_dt’ is a stringtype.&#x20;
* Hint: use $match,$gt$lt, $sum, $sort&#x20;
* Result

  ```python
  [{'_id': {'use_dt': '20171221'}, 'total_rider_pasgr': 8505895},
  {'_id': {'use_dt': '20171222'}, 'total_rider_pasgr': 8917017},
  {'_id': {'use_dt': '20171223'}, 'total_rider_pasgr': 6781294},
  {'_id': {'use_dt': '20171224'}, 'total_rider_pasgr': 4866330},
  {'_id': {'use_dt': '20171225'}, 'total_rider_pasgr': 5096990},
  {'_id': {'use_dt': '20171226'}, 'total_rider_pasgr': 7935515},
  {'_id': {'use_dt': '20171227'}, 'total_rider_pasgr': 7906780},
  {'_id': {'use_dt': '20171228'}, 'total_rider_pasgr': 8056509}]
  ```
