ANLY620 150 words to each questions

Q1.

Data is created constantly through a wide range of sources. These include social media, mobile sensors, medical imaging, video surveillance, gene sequencing, video rendering, and many other tools (EMC, 2015). Big data have several characteristics including volume, complexity, and speed. Several tools have been developed to address the huge volumes of data creation. These tools are known as repositories.

According to Brook (2015), repositories include data warehouse, data lakes, data marts, metadata repositories, and data cubes. Data warehouses aggregate data from multiple sources without any type of relationship. Data lakes stores unstructured data, classifies it, and tags it with metadata. Data marts “are subsets of data repositories” (Brook, 2018). Metadata repositories stores data about data and databases. These repositories explain the source, how data was obtained, and what it represents. Data cubes are lists with three or more dimensions, stored as a table (i.e., a spreadsheet).

There are data repositories and repositories within them that break it down into more usable information. Ultimately is up to the analyst to mine this data, clean it, and analyze it to make better use of it. Data gathering and its analysis is the driving force of today’s smart business decisions.

References

Brook, C. (2018, December 5). What is a Data Repository? Retrieved from Data Insider: https://digitalguardian.com/blog/what-data-repository

EMC. (2015). Introduction to Big Data Analytics. In E. E. Services, Data Science and Big Data Analytics. Indianapolis: John Wiley & Sons, Inc.

Q2.

First, let us explain what the LINEST function does in Excel and how it is used. According to Microsoft (nd), the LINEST function in Excel calculates the statistics for a line by using the least-squares method (y = bx + a) to calculate a straight line that best fits the data and returns an array that describes the line. Furthermore, LINEST can be combined with other functions to calculate statistics for other models that are linear in the unknown parameters, such as polynomial, logarithmic, exponential, and power series. Additionally, LINEST can be entered as a regular formula to obtain a single value (the slope coefficient) or an array formula to obtain an array of values (Slope coefficient, Standard error of slope, R2 F-observed value, etc.). The LINEST function syntax is =LINEST(known_y’s, [known_x’s], [const], [stats]). Where known_y’s is the only required input, the rest known_x’s, const, and stats are optional inputs. More specifically, the known_y’s argument is the range of the dependent values. The known_x’s argument is the range of the independent values, and if it is omitted, Excel assumed the array to be 1,2,3,4…. The const argument is a logical value determining how the intercept (constant a) should be treated. If it is omitted or set to TRUE, the constant is calculated normally (y= bx + a), and if it is set to FALSE, the constant is set to 0, and the slope is calculated using the formula y = bx. Finally, the stats argument is a logical value that determines whether to output additional statistics or not. If it is set to TRUE, the function returns an array with additional regression statistics. If the argument is omitted or set to FALSE, the function only returns the intercept constant and slope coefficient(s). To obtain only the first slope coefficient, use the formula =LINEST(known_y’s) and press enter. To calculate the slope and intercept, use the formula =LINEST(known_y’s, [known_x’s]) and press enter. Then select two adjacent cells in the same row and press Ctrl + Shift + Enter to obtain the slope and intercept values. In multiple regression, all the independent values need to be in adjacent columns and the whole range used in the known_x’s argument. Also, when the multiple regression, LINEST returns the slope coefficients from right to left. For example, for X1, X2 and X3, LINEST return the slope coefficients in X3, X2 and X1 order. Finally, to obtain additional regression statistics, such as R2 and F statistic, use the formula =LINEST(known_y’s, [known_x’s], TRUE, TRUE) or =LINEST(known_y’s, [known_x’s], , TRUE) and press enter. Then select a range of 3 rows and 5 columns and press Ctrl + Shift + Enter to obtain the R2, Standard errors, F statistic, df, the regression sum of squares, and the residual sum of squares. I hope this helps everyone to understand the LINEST function.

Vr

Rommel Blanco

Resources

Microsoft. (nd). LINEST function. https://support.microsoft.com/en-us/office/linest-function-84d7d0d9-6e50-4101-977a-fa7abf772b6d