APPLICATION OF THE NON-PARAMETRIC SIGNALS TEST TO A COMPANY

Objective: The aim of this paper is to show the application of the Non-Parametric Sign Test in problems involving the testing of central tendency values. Theoretical framework: Non-parametric methods are widely used in the study of populations that are taken in rank order (such as a movie that receives one to four star ratings). The use of non-parametric methods may also be necessary when the data has a ranking but no clear numerical interpretation, such as when accessing preferences. In terms of scale, non-parametric methods result in data that is "in order" (Thatcher et al., 2005) Method: Data was taken from a company in the South of Rio de Janeiro and a case study was carried out using the Non-Parametric Sign Test. Final Considerations: Initially it proved unfeasible to use a Parametric Test because the Anderson-Darling Test showed that the Assumption of Normality was not confirmed and finally the Non-Parametric Sign Test showed that the hypothetical Median really is the correct measure of Central Tendency. Implications of the research: The use of Non-Parametric Tests is widespread in scientific literature and has proven to be highly effective in dealing with data where the assumptions of Normality are not confirmed. Originality/value: Despite being well-known statistical tools, Non-Parametric Tests are widely used and can bring innovations to their application, as in the case of the company in question.


INTRODUCTION
During the last century, statistics revolutionized science by presenting useful models that modernized the research process in the direction of better research parameters, making it possible toguide decision making in a wide variety of areas.Statistical methods were developed as a mixture of science and logic for the solution and investigation of problems in various areas of human knowledge (Akdur, 2022;Antonio et al., 2023;de Araújo et al., 2021;Mazza et al., 2022;Sampaio et al., 2024;Silva et al., 2023) The launch of a new product and/or process usually involves working with a large number of variables.Conscientious planning of the experiments that must be used to manipulate these variables and arrive at the desired answers is indispensable if reliable results are to be obtained and if consistent statistical analyses are to be carried out.In this context, it is no longer possible to develop products and processes empirically as was done in the past.The strong competition, the diffusion of technological processes and the responsibility of the scientific community now make such procedures impossible.The optimization of processes and products requires more than ever a robust statistical study (Cardoso et al., 2022;Carvalho, 2023;da Motta Reis et al., 2023;F. da S. Gomes et al., 2022; F. M. Gomes et al., 2023;Oliveira et al., 2023;Sales et al., 2022;Sampaio et al., 2024).
Non-parametric methods are widely used in the study of populations that are taken in rank order (such as a movie that receives one to four star ratings).The use of non-parametric methods may also be necessary when the data has a ranking but no clear numerical interpretation, such as when accessing preferences.In terms of scale, non-parametric methods result in data that is "in order" (Thatcher et al., 2005).
The aim of this article is to present the results obtained with the Non-Parametric Signal Test for a case study in a company in the South Fluminense Region of the State of Rio de Janeiro, Brazil.

THEORETICAL BACKGROUND
Conventional statistical tests are usually called parametric tests.Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests.Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed.However, parametric test can be misleading when this assumption is not satisfied.In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption.Nonparametric tests are the statistical methods based on signs and ranks (Nahm, 2016).Efficiency, in its broadest sense, refers to the cost, time, and effort required to use a test .In terms of statistical power, efficiency refers to the minimum sample size necessary to detect a false null hypothesis.The smaller the sample necessary to detect a treatment, the more efficient, or powerful, is the statistic.An index that compares one test's requirements in terms of sample size to an alternative test is the Relative Efficiency (RE).The RE is the ratio of the necessary size of the sample of each test to achieve a desired power level.To be a fair index, nominal alpha of the two competing tests must be maintained at the same level while testing the identical hypothesis.The statistic that requires the smaller sample size is the more efficient test.The RE is relative because it depends on alpha and the distribution.An index is needed that compares the efficiency of competing statistical tests under many different conditions (Sawilowsky, 1990).
The use of statistical criteria as a basis for choosing between PAR and NPAR tests has a long and controversial history.In the evaluation of any statistical test, the two distributional characteristics of primary interest are its ability to control the Type I error rate at nominal (i.e., researcher-specified) levels and its statistical power.A test that controls its Type I error rate at nominal levels and generates good statistical power is deemed the procedure of choice, and hence these two properties are usually used as the basis of evaluation.The extent to which a test controls its Type I error rate is related to how well the underlying assumptions of the test are satisfied by the data and the sensitivity of the test to departures from these assumptions.If departures from the underlying assumptions do not seriously impair the distributional properties of a test, the test is considered to be robust.One framework for examining these assumptions is how well the statistical model (i.e., the test) fits the observed data.A good fit implies the test should control its Type I error rate at nominal levels, whereas a poor fit indicates otherwise.Naturally, no data set ever satisfies all the underlying assumptions perfectly, and hence the fit is never perfect and the evaluation of Type I error properties is complicated.A basic method of studying these properties is the use of computer-simulated data to compare the distributional performance of tests across a variety of data conditions (e.g., sample size, underlying distribution).Fortunately, there is a good deal of evidence available to evaluate the robustness of PAR and NPAR tests.A second statistical criterion is power, and here simulation evidence and conventional wisdom collide.A reason often cited for not performing NPAR tests is that such analyses result in a substantial drop in statistical power.If the underlying distribution is normal and a rank transformation is used, the drop off in power for moderately large samples is, for a wide class of rank tests, only a few points.For non-normal distributions, there is a good deal of empirical evidence showing that NPAR tests often enjoy a power advantage over their PAR competitors.For example, simulations have shown that for a variety of non-normal, unimodal distributions often observed in practice, the power advantages of NPAR over PAR tests can be greater than 20 points (de Souza Sampaio et al., 2022;Espuny et al., 2023;F. M. Gomes et al., 2023;Harwell, 1988;Leoni et al., 2017;Mazza et al., 2022Mazza et al., , 2024;;Silva et al., 2023).
Recently, non-parametric statistical procedures have been considered for use in analyzing the results of a very large number of problems in many different areas (García et al., 2009;Malik et al., 2021).
When a researcher uses non-parametric tests, it is assumed that the distribution of his experimental data is not normal, or that he does not have enough information to be able to say that it is.When in doubt about this information, there is nothing to stop them from opting to use non-parametric statistics.What he cannot do, in any way, is argue in terms of deviations or standard errors, although he can perfectly well do so purely and simply in terms of averages.Non-parametric tests, also known as free distribution tests, are those based on certain hypotheses, but which do not have a normal organization.Generally, they contain statistical results from their orderings, which makes them easier to understand, but they do have some limitations, including that they are not strong enough when a normal hypothesis is met.This can lead to them not being rejected, even if they are false.Another of their limitations is that they require the hypothesis to be changed when the test does not correspond to the procedural question if the sample is not proportional (Oprime et al., 2015;Rocha & Bacelar Júnior, 2018).Quando a distribuição dos dados é assimétrica, os métodos não paramétricos se mostram mais eficientes (Montgomery, 2004).
In parametric analysis assuming that population is normal.In this case, checking whether population is normal or not.Normality tests are used in different sectors.One application of normality tests is to the residuals from a linear regression model.If they are not normally distributed, the residuals should not be used in Z tests or in any other tests derived from the normal distribution, such as t tests, F tests and chi-squared tests.If the residuals are not normally distributed, then the dependent variable or at least one explanatory variable may have the Correcting one or more of these systematic errors may produce residuals that are normally distributed.In parametric statistics, it is assumed that the samples are drawn from fully specified distributions characterized by one or more unknown parameters about which we want to make inferences.In a non-parametric method, it is assumed that the source distribution of the sample is indeterminate and that we often want to make inferences about the center of the distribution.For example, many tests in parametric statistics such as the 1-sample t-test are derived from the assumption that the data comes from a normal population with an unknown mean.In a non-parametric study, the assumption of normality is eliminated.
Various statistical methods used for data analysis make assumptions about normality, including correlation, regression, t-tests, and analysis of variance.Central limit theorem states that when sample size has 100 or more observations, violation of the normality is not a major issue.Although for meaningful conclusions, assumption of the normality should be followed irrespective of the sample size.If a continuous data follow normal distribution, then we present this data in mean value.Further, this mean value is used to compare between/among the groups to calculate the significance level (P value).If our data are not normally distributed, resultant mean is not a representative value of our data.A wrong selection of the representative value of a data set and further calculated significance level using this representative value might give wrong interpretation.That is why, first we test the normality of the data, then we decide whether mean is applicable as representative value of the data or not.If applicable, then means are compared using parametric test otherwise medians are used to compare the groups, using nonparametric methods.An assessment of the normality of data is a prerequisite for many statistical tests because normal data is an underlying assumption in parametric testing.There are two main methods of assessing normality: Graphical and numerical (including statistical tests).Statistical tests have the advantage of making an objective judgment of normality but have the disadvantage of sometimes not being sensitive enough at low sample sizes or overly sensitive to large sample sizes.Graphical interpretation has the advantage of allowing good judgment to assess normality in situations when numerical tests might be over or undersensitive.Although normality assessment using graphical methods need a great deal of the experience to avoid the wrong interpretations.If we do not have a good experience, it is the best to rely on the numerical methods.There are various methods available to test the normality of the continuous data, out of them, most popular methods are Shapiro-Wilk test, Kolmogorov-Smirnov test, skewness, kurtosis, histogram, box plot, P-P Plot, Q-Q Plot, and mean with SD.The two well-known tests of normality, namely, the Kolmogorov-Smirnov test and the Shapiro-Wilk test are most widely used methods to test the normality of the data (Mishra et al., 2019).
A test is said to be powerful when it has a high probability of rejecting the null hypothesis of normality when the sample under study is taken from a non-normal distribution.In making comparison, all tests should have the same probability of rejecting the null hypothesis when the distribution is truly normal (i.e. they have to have the same Type I error which is α, the significance level) (Yap & Sim, 2011).
The Anderson-Darling test compares the empirical cumulative distribution function of your sample data with the distribution expected if the data were normal.If this observed difference is large enough, the test rejects the null hypothesis that the population is normal.It does not take rounding into account (De Almeida et al., 2020;de Araújo et al., 2021; F. da S. Gomes et al., 2022;Nelson, 1998).
Shapiro-Wilk test is based on the correlation between the data and the corresponding normal scores and provides better power than the K-S test even after the Lilliefors correction.Power is the most frequent measure of the value of a test for normality, the ability to detect whether a sample comes from a non-normal distribution.Some researchers recommend the ShapiroWilk test as the best choice for testing the normality of data (Ghasemi & Zahediasl, 2012).
The Sign Test is used to analyze dependent samples.Therefore, this test is an alternative to the t-test for dependent samples.It is applied in situations where the researcher wants to determine whether two conditions are different (Kritchman & Nadler, 2008;Mazza et al., 2024).
The advantages of non-parametric tests are that they can be used in different situations, as long as they don't have to obey strict parameters; their methods are generally simpler, which makes them easier to understand; they can be applied to non-numerical data and make it easier to obtain the most important and appropriate individual information for the research process (Qualls et al., 2010;Zimmerman & Zumbo, 1993).
The mean can be calculated for any set of numerical values, but only normal curves have a standard deviation, since, by definition, "standard deviation is the inflection point of the normal curve" -and no other.They are two in number and symmetrical in relation to the mean of the distribution.Therefore, asymmetrical curves can never have a standard deviation because, even if they did have inflection points, as many other mathematical curves do, they would hardly be symmetrical in relation to the mean.In short, even though experimental distributions may show some asymmetry, this must remain within certain limits, which are acceptable in statistical terms -and acceptable because they are attributable to chance variation determined by uncontrolled sampling errors, in other words, chance variation, typical of socalled random variables and samples.When a researcher uses non-parametric tests, it is assumed that the distribution of his experimental data is not normal, or that he does not have enough information to be able to say that it is.When in doubt about this information, there is nothing to stop them from opting to use non-parametric statistics.What he cannot do, in any way, is argue in terms of deviations or standard errors, although he can perfectly well do so purely and simply in terms of averages.Non-parametric tests are not entirely free of assumptions about the data: for example.For example, it is essential to assume that the observations in the samples are independent and come from the same distribution.Furthermore, in two-sample experiments, the assumption of equal shape and dispersion is necessary.Nonparametric tests have the following limitations: Non-parametric tests are generally less powerful than the corresponding parametric tests when the assumption of normality holds.Thus, you are less likely to reject the null hypothesis when it is false if the data comes from the normal distribution.
Non-parametric tests often require you to modify the hypotheses.For example, most non-parametric tests on the center of the population are tests on the median instead of the mean.The test does not answer the same question as the corresponding parametric procedure if the population is not symmetrical.When there is a choice between using a parametric test or a nonparametric test and you are relatively certain that the assumptions for the parametric procedure are met, use the parametric procedure.It is also possible to use the parametric procedure when the population is not normally distributed if the sample size is large enough (Qualls et al., 2010).
Non-parametric methods are widely used in the study of populations that are taken in rank order (such as a movie that receives one to four star ratings).The use of non-parametric methods may also be necessary when the data has a ranking but no clear numerical interpretation, such as when accessing preferences.In terms of scale, non-parametric methods result in data that is "in order".As non-parametric methods make fewer assumptions, their applicability is broader than corresponding parametric methods.In particular, they can be applied in situations where less is known about the problem in question.In addition, due to less dependence on assumptions, non-parametric methods are more robust.Another justification for using non-parametric methods is simplicity.In certain cases, even when the use of parametric methods is justified, non-parametric methods are easier to use.Due to both simplicity and greater robustness, non-parametric methods are seen by some people in the statistical field as the method that leaves the least room for misuse and misunderstanding.The greater applicability and robustness of non-parametric tests comes at a cost: in some cases where parametric tests would be appropriate, non-parametric tests have less statistical power.In other words, a larger sample may be needed to draw conclusions with the same degree of confidence (Zimmerman & Zumbo, 1993).

MATERIALS AND METHODS
This work can be classified as applied research, as it aims to provide improvements in the current literature, with normative empirical objectives, aiming at the development of policies and strategies that improve the current situation (Bertrand & Fransoo, 2002;de Araújo et al., 2021).The approach to the problem is quantitative, as is the modeling and simulation research method.The research stages were carried out following the sequence shown in Figure 1

Case 1
The case shows a case study carried out in a company in the South Fluminense Region, in the state of Rio de Janeiro.The company is a metallurgical plant, and it is desired to determine whether the average chromium content of stainless steel samples is equal to 18%. 12 samples were randomly selected and the chromium content shown in Table 1 was measured.The Non-Parametric Test was used because there was no way of guaranteeing the statistical assumption of Normality.

RESULTS AND DISCUSSION
Case 1 First, the Basic Statistics were calculated.The statistical assumption of Normality was met and, as this assumption was not met, the Non-Parametric Sign Test was performed, both using Minitab 19 Statistical Software (Bergmann & Ludbrook, 2000).
The values for Mean, Standard Error of the Mean, Standard Deviation, Median, Asymmetry and Kurtosis were obtained and are shown in Table 2 below: If the Asymmetry and Kurtosis values were between -1 and 1, it would be a Symmetrical Distribution, but as both values are above 1, it is an Asymmetrical Distribution, indicating the non-normality of the data.
To continue testing Normality, the Anderson-Darling test is performed, which consists of the following Hypotheses: Ho: Data follow the Normal Distribution H1: Data do not follow the Normal Distribution The Anderson-Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution.In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free.However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values.When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality.Anderson-Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.Anderson-Darling (AD) test is a modification of the Cramer-von Mises (CVM) test.It differs from the CVM test in such a way that it gives more weight to the tails of the distribution (Mohd Razali & Bee Wah, 2011).
If the p-Value is less than 0.05 (5%), Ho is rejected and H1 is accepted, otherwise Ho is accepted.
Figure-1 shows that the p-value was less than 0.05 (5%), so Ho is rejected and H1 is accepted, indicating that the data is not Normalized.10 The number chosen was 4 and the Cumulative Binomial Probability was 0.1936, which multiplied by 2, as it is two-tailed, was 0.388 (38.8%).This is the pvalue for the Signals Test.If the pvalue is greater than 0.05, there is no statistical evidence to reject Ho.The Hypotheses are as follows: Ho: The Hypothetical Median is 18 H1: The Hypothetical Median is different from 18 Since 0.388 > 0.05 (significance level), Ho is accepted, i.e. the median is 18.
In other words, it is not possible to conclude that the median chromium content of the population differs from 18%.
Another easier alternative is to perform the Sign Test for the Median directly in Minitab.This can be seen in Figure -3 It is easy to see that the p-value is so low that the Software approximates the value to zero, so it is much lower than 0.05 (5%), and Ho is therefore accepted.The Median value is 18

FINAL CONSIDERATIONS
In conclusion, the non-parametric sign test is a very viable option when the assumptions of normality cannot be guaranteed in a data sample.If the population variable analyzed does not follow a normal distribution and/or the samples are small, a non-parametric test can be applied.This test is also applicable to "before and after" situations where each individual is observed twice: before and after a certain treatmentIn this specific case of the company, it can be concluded that the Median is the best way to show the central tendency of the data, since the Mean was completely displaced from the central point due to the Outliers that emerged.

Figure 1 -Figure 2 :
Figure 1-Probability Graph of the Total Amount of Chromium

Table 1 -
Total percentage of chromium in the samples analyzed

Table 2 -
Basic Data Statistics