Data Science in Python Interview Questions and Answers


1. When do you use *args?

In cases when we don’t know how many arguments will be passed to a function, like when we want to pass a list or a tuple of values, we use *args.

def func(*args):
for i in args: print(i)
func(3,1,4,7)
# You can change the number of arguments inside the function- func

3

1

4

7

2.How do you create a random 1D array?

Using numpy library's random function you can create a random 1 dimensional array of any given size.

import numpy as np
array = np.random.rand(5)
print("1D Array filled with random values :", array)

1D Array filled with random values : [ 0.40537358 0.32104299 0.02995032 0.73725424 0.10978446]

3.What will be the output of this code snippet?

list1 = [1, 2, 3, 4, 1, 1, 1, 4, 5]

print(list1.index(10))

Explanation:This line will throw a value error in python since when index command tries to get index of value 10" it is not able to find value 10 in the list and hence throws a value error. ValueError in Python means that there is a problem with the content of the object you are trying to access or assign the value to.

list1 = [1, 2, 3, 4, 1, 1, 1, 4, 5]
print(list1.index(10))

list1 = [1, 2, 3, 4, 1, 1, 1, 4, 5]
print(list1.index(4))

3

4.How do you list multiple csv files in a folder?

You can use the function listdir from os library in python to achieve the same.

import os
os.listdir()

['.ipynb_checkpoints', 'Capture.PNG', 'cars.csv', 'foo.pdf', 'Pandas numpy matplotlib seaborn.ipynb']

5.What is the difference between pivot_table and groupby in pandas?

Both pivot_table and groupby are used to aggregate your dataframe. The difference is only based on the shape of the result.

df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})

pvt_tbl = pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)

pvt_tbl

df.groupby(['a','b'])['c'].sum()

6.What is the difference between loc and iloc?

  • loc gets rows (or columns) with particular labels from the index
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers)
s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
s.iloc[:3] # slice the first three row i.e. indexes 0,1,2
s.loc[:3] # slice up to and including index label 3

7.What is the difference between apply and applymap function in pandas?

  • apply() is used to Apply a function along an axis (across rows or columns) of the DataFrame and
  • applymap() is used to apply a function to a Dataframe elementwise
df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
df.apply(np.sum, axis=0)
df.apply(np.sum, axis=1)
f = lambda x: x + 2
df.applymap(f)

8.Suppose you are given cars dataset which has Horsepower column, How will you create a variable HP which will take only two values: IF Horsepower < 100 THEN "Low HP" IF Horsepower >= 100 THEN "High HP" ?

cars = pd.read_csv('cars.csv')
cars.head()
f = lambda x: "low hp" if x< 100 else "high hp"
cars['hp'] = cars.Horsepower.apply(f)
cars.head()

9.What are the different parts of a plot in matplotlib?

A Matplotlib plot can be divided into following parts

  • Figure

The whole figure. The figure keeps track of all the child Axes, a smattering of ‘special’ artists (titles, figure legends, etc), and the canvas.A figure can have any number of Axes, but to be useful should have at least one.

  • Axes

This is what you think of as ‘a plot’, it is the region of the image with the data space (marked as the inner blue box). A given figure can contain many Axes, but a given Axes object can only be in one Figure. The Axes contains two (or three in the case of 3D) Axis objects (be aware of the difference between Axes and Axis) which take care of the data limits (the data limits can also be controlled via set via the set_xlim() and set_ylim() Axes methods). Each Axes has a title (set via set_title()), an x-label (set via set_xlabel()), and a y-label set via set_ylabel()).

  • Axis

These are the number-line-like objects (circled in green). They take care of setting the graph limits and generating the ticks (the marks on the axis) and ticklabels (strings labeling the ticks). The location of the ticks is determined by a Locator object and the ticklabel strings are formatted by a Formatter. The combination of the correct Locator and Formatter gives very fine control over the tick locations and labels.

  • Artist

Basically everything you can see on the figure is an artist (even the Figure, Axes, and Axis objects). This includes Text objects, Line2D objects, collection objects, Patch objects ... (you get the idea). When the figure is rendered, all of the artists are drawn to the canvas. Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved from one to another.


10.What are Subplots in matplotlib?

Subplots are grid of plots within a single figure. Subplots can be plotted using subplots() function from matplotlib.pyplot module.

x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)

## one figure with one subplot

fig, ax = plt.subplots(ncols=1,nrows=1)
ax.plot(x, y)
plt.plot()

11.How do you plot a histogram in matplotlib?

fig1, ax1 = plt.subplots()
ax1.hist(cars.Horsepower)
plt.show()

12.What is the difference between remove(),del(), and pop() in python?

  • remove() removes the first matching value in a given list
  • del() removes the item at a specific index
  • pop() removes the item at a specific index and returns it.
# remove() removes the first matching value, not a specific index:
a = [0, 2, 3, 2]
a.remove(2)
a

[0, 3, 2]

[0, 3, 2]
a = [3, 2, 2, 1]
del a[3]
a

[3, 2, 2]

[3, 2, 2]
a = [4, 3, 5]
a.pop(1)
a
[4, 5]

[4, 5]

13.What is the difference between list.append() and list.extend()?

append() adds its argument as a single element to the end of a list. The length of the list itself will increase by one

extend() iterates over its argument adding each element to the list, extending the list. The length of the list will increase by however many elements were in the iterable argument

x = ["1", "2", "3","new","old"]
x.extend([4, 5])
print (x
print("Length of list is :",len(x))

['1', '2', '3', 'new', 'old', 4, 5] Length of list is : 7

x = ["1", "2", "3","new","old"]
x.append([4, 5])
print (x)
print("Length of list is :",len(x))

['1', '2', '3', 'new', 'old', [4, 5]] Length of list is : 6

14.How do you save a plot in pdf?

f = plt.figure()
plt.plot(range(10), range(10), "o")
plt.show()

f.savefig("foo.pdf")

15.How do you check class of an object in python?

cars.head()
type(cars)

pandas.core.frame.DataFrame

16.How do you check class of each variable in a pandas dataframe?

cars.dtypes

Car object

MPG float64

Cylinders int64

Displacement float64

Horsepower int64

Weight int64

Acceleration float64

Model int64

Origin object

hp object

dtype:object

17.How to add title to subplots in matplotlib?

fig, axarr = plt.subplots(2, sharex=True, sharey=True)
axarr[0].plot(x, y)
axarr[0].set_title('Subplot 1')
axarr[1].scatter(x, y)
axarr[1].set_title('Subplot 2')

18.What is broadcasting in numpy?

The term broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations. If the dimensions of two arrays are dissimilar, element-to-element operations are not possible. However, operations on arrays of non-similar shapes is still possible in NumPy, because of the broadcasting capability. The smaller array is broadcast to the size of the larger array so that they have compatible shapes. NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints.

a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0])
a * b
array([2., 4., 6.])

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

they are equal, or

one of them is 1 If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

a = np.array([1.0, 2.0, 3.0])
b = np.array([5.0, 2.0])
a * b

19.What will be the output of below code _ fig, axarr = plt.subplots(3, 1,sharex=True, sharey=True)_?

The above line of code will generate 1 figure with 3 subplots arranged in 3 rows and one column with shared x axis.

fig, axarr = plt.subplots(3, sharex=True, sharey=True)
fig.suptitle('Sharing both axes')
axarr[0].plot(x, y)
axarr[1].scatter(x, y)
axarr[2].scatter(x, 2 * y ** 2 - 1, color='r')

20.When do you create a bee swarm plot in python?

When we want a good representation of the distribution of values in data we use swarmplot() from seaborn library in python. But refrain from using swarmplot in case you have large number of observations since it does not scale well to large numbers of observations.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.swarmplot(x="day", y="total_bill", data=tips)
ax = sns.boxplot(x="day", y="total_bill", data=tips,
     showcaps=False,boxprops={'facecolor':'None'},
     showfliers=False,whiskerprops={'linewidth':0})
 
plt.show()