Medición de la motivación para aprender inglés como lengua extranjera: Validación del LLOS-IEA. Diferencias de Sexo y Edad en las regulaciones emocionales

The object of this research was to translate into Spanish and adapt to a CLIL context the Language Learning Orientation Scale-Intrinsic Motivation, Extrinsic Motivation and Amotivation Subscales (LLOS-IEA) analysing its psychometric properties. The LLOS-IEA adaptation was administered to a total of 3355 students from Andalusia aged from 11 to 17 years. After the translation and adaptation processes, item and internal structure analyses were conducted comparing the 5 and 7-factor models. This led to a CFA and to reliability and validity analyses, which concluded that the 5-factor performed the best fit. Other analyses have shown this instrument to be sex-invariant and to have convergent validity. A multi-level model analysis was also conducted in order to study the construct validity concluding with similar results from similar studies.


Introducción
Due to the implication in the educational field, motivation in foreign language learning (FLL) has been analyzed from different approaches. The Self-Determination Theory (SDT) is playing a central role in this study as "over the years the theory -and particularly its two linchpins, intrinsic and extrinsic motivation-has become one of the most influential constructs in motivational psychology" (Dörnyei & Ryan, 2015, p. 82).
The SDT is considered a meta-theory of human motivation that studies the degree in which human behaviour is self-determined, it "focuses on types, rather than just amount, of motivation" (Deci & Ryan, 2008, p. 182). This theory sets two main sorts of motivation: intrinsic (IM), which is related with inherent satisfaction of taking part in one activity, and extrinsic (EM), which "refers to the performance of an activity in order to attain some separable outcome" (Ryan & Deci, 2000, p. 71). Both IM and EM have been divided into different sub-categories of motivation (Vallerand, 1997;Vallerand et al., 1989;Vallerand et al., 1992). On the other hand, amotivation (AM) is found when the individuals cannot find any relation between their actions and the consequences of their behaviour. Several studies in the foreign language (FL) field have positively linked IM with some outcomes, such as self-confidence (Pae, 2008), school performance (Morales Rodrıǵuez, 2011), intention to continue with the FL studies and anxiety reduction (Noels et al., 1999;Noels et al., 2000). This study is based on a Content and Language Integrated Learning (CLIL) setting defined as a "dual-focused educational approach in which an additional language is used for the learning and teaching for both content and language" (Coyle et al., 2010, p. 1). There is much research drawn from different contexts that prove the positive effect of this approach on the student motivation, some examples are the studies of Lasagabaster (2011) in the Basque County, Seikkula-Leino (2007) in Finland, Sylvén and Thompson (2015) in Sweden, Mearns (2012) and Hunt (2011) in the UK or Hewit and García-Sánchez (2012) in the Spanish province of Almeria. However, there is a lack of research based on the analysis of psychometric properties of motivational instruments in Spanish of either FLL and CLIL domains.
Regarding the SDT questionnaire design, the introduction of the "Language Learning Orientation Scale-Intrinsic Motivation, Extrinsic Motivation and Amotivation Subscales" (LLOS-IEA; Noels et al. 2000), opened the door to the study of the IM and EM based on the domain of L2 (second language) learning. This scale was founded on the "Academic Motivation Scale", (Vallerand et al., 1989;Vallerand et al., 1992). From this point, several scholars designed or/and adapted questionnaires, which proceed from the SDT in the field of L2. Maherzi (2011) proposed a trans-cultural validation for University students in Saudi Arabia. Lucas et al. (2010) created one version with factors for IM regarding the four linguistic skills (speaking, listening, reading and writing). Finally, Ardasheva et al. (2012) designed the English Language Learner Motivation Scale (ELLMS) conceived for pre-university bilingual contexts.
It is necessary to take into account that, during the last decade, the CLIL approach has spread all over Europe, to the extent that most countries have applied it in some of their schools (Eurydice, 2008). In this scenario Coyle (2010) assured that Spain was "rapidly becoming one of the European leaders in CLIL practice and research" (p.xiii), and precisely in the region of Andalusia, this approach has been widely implemented with a total of 947 bilingual schools (Consejería de Educación, 2015a) that served 321.685 CLIL students in the school year 2015 (Consejería de Educación, 2015b). Also, taking into consideration the importance of the Spanish language all over the world, the study of the psychometric properties and the validation of an adapted Spanish LLOS-IEA would open multiple researching opportunities apart from providing an instrument to measure motivation in FL students.

Instruments
A translation and adaption of the scale designed by Noels et al. (2000) has been implemented. It was based on the "Academic Motivation Scale", developed by Vallerand et al. (1989), and originally adapted and translated into English by Vallerand et al. (1992). The original instrument assesses the "Amotivation" (α=.82), the three types of EM -"External Regulation" (α=.75), "Introjected Regulation" (α=.67) and "Identified Regulation" (α=.87)-, and the three types of IM -"Knowledge" (α=.85), "Mastery" (α=.88) and "Stimulation" (α=.85)-. This scale includes three items for each of its seven factors, where the respondents indicate the extent to which they agree with each item by using a 7-point Likert scale ranged from 1 = Does not to correspond at all to 7 = Corresponds exactly.

Process of adaptation and translation
The process of translation of the LLOS-IEA was undertaken following the international methodological standards that the International Test Commission (ITC) recommends to adapt tests and scales from one culture into another (Hambleton et al., 2005;Muñiz, 2000;Muñiz & Bartram, 2007). In order to proceed precisely, processes of direct and back translation of the items were performed (Brislin, 1970(Brislin, , 1986. Following the parallel back translation procedure (Brislin, 1986), two translators independently translated one version in the target language (Spanish); later these works were re-translated into English by two professionals who were not aware of original work. The quality of the work was assessed regarding the similarity with the original version (Hambleton et al., 2005), and there were hardly no modifications as both versions were almost similar.
The LLOS-IEA was adapted from a FLL context to a context of learning "through" a FL by rephrasing some items when required (e. g. study a second language was replaced by study "in" English). Subsequently, a qualitative evaluation (content validity) of the work was undertaken by five experts (Osterlind, 1989): two in scale design and three in the construct assessed. They were provided with an items ' specification table (Calabuig & Crespo, 2009;Spaan, 2006), which included the semantic definition of the construct, its components and a list of the original and adapted items. These experts judged each item's weightiness in its domain by using a scale from 1 (not at all) to 4 (absolute), and they also assessed the item's suitability. They had the opportunity to write any concern, annotation or an alternative wording of any of the items.
The items that scored mean values <2.5 in suitability (Nuviala Nuviala et al., 2008) were revised according to the experts 'reviews, and if four out of five experts did not classify any item within its theoretical dimension, it was readapted again so it would clearly and accurately express the theoretical dimension. The overall item concordance of comprehensibility and suitability was measured through the Intraclass Correlation Coefficient (ICC) from a Two-way mixed model, assuming an absolute agreement. The values obtained were ICC=.56 for item suitability and ICC=.83 for item weightiness.
The new version was administered to 55 CSE and pre-university students aged between 12 and 18 using various options of density, item separation and general formatting (Dörnyei, 2003) that conducted to minor modifications. The final version of the LLOS-IEA was obtained after an analysis of the psychometric results, and one last revision carried out by the research team.

Procedure
After the permission from the school administrators, the questionnaire was administered informing the anonymous and voluntary nature of participation. Also, this research has ethical approval. The participation took part between January and March 2016, and lasted about twenty minutes, and concerns about comprehension were attended throughout that time. According to the Declaration of Helsinki (2008), all the respondents were briefly informed about the purpose of the study and their rights as participants, apart from being given the opportunity to give up the survey at any time.

Data analysis
First, an item and homogeneity of the scale analysis was performed, which included: each dimension Cronbach's alpha (α); and each item M, ST, corrected itemtotal correlation coefficient (CITC-c), correlation between the item and its dimension (CC), the Cronbach's alpha if item were deleted, Kurtosis and Skewness. For this analysis, the SPSS v. 21 for Mac OS X was used.
Afterwards, as part of an exploratory factorial analysis (EFA), an extraction method of principal components (PCA) was performed extracting a fixed number of seven factors following the structure of the original instrument (Noels et al., 2000). Subsequently, a second PCA analysis was conducted with no fixed factors to extract.
Later, in order to assess whether the data distribution was normal, an analysis based on the Relative Multivariate Kurtosis (RMK) of PRELIS through LISREL 8.80 programme was performed. In order to confirm the dimensionalization of the scale, the factor structure of the instrument was assessed with CFA using the Weighted Least Squares (WLS) estimation method for ordinal variables in the LISREL 8.80 (Jöreskog & Sörbom, 2003). In addition to the factor structure from the original instrument, two other 5 and 7-factor models were also compared. Regarding reliability and validity, in addition to the α value, the Composite Reliability and the Average Variance Extracted (AVE) for each dimension were also calculated.
Last, the convergent validity, the construct validity and the sex invariance were determined. To study the construct validity a multi-level analysis was performed. The LLOS-IEA factors were selected as an independent variable, and the students 'sex and age were the factors of this mixed model multi-level analysis.

Items 'analysis and scale homogeneity
The items 'statistical analysis held the item-factor distribution of the original instrument. The criteria to maintain items was: CITC-c ≥ .30, SD > 1, and all the possible responses used at least once (Nunnally & Bernstein, 1995). The Kurtosis and Skewness should also be close to 0 and <7 (Curran, West, and Finch (1996) (table 1). Items from factor 1 (AM) showed M values between 1.58 from item 19 and 1.75 from item 4. All the SD values were >1, and this dimension's internal consistency was satisfactory (α=.783). All the CITC-c were ≥.58.
With regard to EM, all the items from factor 2 (EM-external regulation) presented mean values from 5.89 (item 21) to 6.43 (item 14), SD values were >1, and this dimension's internal consistency was suitable (α=.787). However, the removal of item 21 would increase α up to .81. All the CITC-c were ≥.58. Relative to factor 3 (EMintrojected regulation) all the mean values ranged between 3.10 (item 11) to 4.36 (item 15). SD values were >1, and this dimension's internal consistency was nearly satisfactory (α=.694), but the dismissal of item 15 would raise α up to .71. All the CITCc values were ≥.42. Finally, factor 4 (EM-identified regulation) showed values from 5.34 (item 5) to 5.80 (item 1). SD values were >1 and this dimension's internal consistency was nearly satisfactory (α=.631) nevertheless the α value of this factor without item 1 would be .80. All the CITC-c were ≥.33.
Some authors such as Carretero and Pérez (2005) recommend performing a correlation study in order to guarantee each dimension's homogeneity (CC). In this work, the correlation between each item's score and its overall component's score were CC≥.33. It is worth mentioning the fact that items 4, 10, 19, 14 y 20 presented higher Kurtosis or Skewness values which were taken into consideration for the studies mentioned later on. However, Curran et al. (1996) state that the Kurtosis value could reach as far as 7.0. The rest of the items are ranged within the acceptable values so as for avoiding their removal.

Internal structure analysis
Following the validation process of the original instrument (Noels et al., 2000), a EFA for the seven-factor model was first performed. A PCA method was conducted, requiring a minimum correlation of .40 in order to consider each item important within the factor (Stevens, 1992), the Kaiser-Meyer-Olkin (KMO) index was good (.892), and the Bartlett's sphericity test was significant (c 2 (210)=29186.601, p<.000), concluding with the suitability of the implementation of the EFA. The results confirmed a 7-factors extraction accounting for 70.432% of the total variance explained (table 2).
The results confirmed a dimensional 7-factor structure with saturation values above .41. However, items from IM-knowledge and IM-achievement merged into one dimension. On the other hand, contrary to the original instrument, items 1 and 2 would be placed respectively outside the EM-identified regulation and IM-knowledge dimensions. It is worth mentioning that the item 2 factor saturation value was .41.
Bearing in mind that both merged factors belong to the IM group, and that some other investigations have analysed 5-factors models for similar instruments (Li & Harmer, 1996;Ntoumanis, 2001), a new PCA analysis was conducted. In this case, no fixed factors to extract were indicated, requiring at least a .40 correlation in order to consider each item important within the factor (Stevens, 1992). The KMO index was good (.890) and Bartlett's sphericity test was significant (c 2 (210)=29278.425 p<.000), concluding with the suitability of the implementation of the EFA. The results of this analysis yielded seven factors, accounting for 62.287% of the total variance explained (table 2). It should be noted that item 1, which was formerly situated outside the theoretical model, now belongs to IM factor, although presenting lower h 2 and factor saturation values-a fact to be considered in future analysis.

Confirmatory factor analysis
In order to study the psychometric properties of LLOS-IEA original dimesionalization (Noels et al., 2000), structural equation modelling was performed. Different absolute and relative fitness indices were calculated (Bentler, 2007;Markland, 2007), such as p-value associated with Chi-square test, c2 and degrees of freedom ratio (df; c2 /df), goodness of fit index (GFI), normed fit index (NFI), nonnormed fix index (NNFI), and comparative fit index (CFI). The estimated parameters were considered significant when the value associated with the t-value was higher than 1.96 (p < 0.05).
Firstly, RMK analysis was conducted with this scale which resulted with a Mardia-Based-Kappa value of .317. Test results showed that multivariate normality could not be accepted (upper limit=1.006; lower limit=.994), which implied the use of a robust estimator. Therefore, a weighted least squares (WLS) estimation method for ordinal variables in the LISREL 8.80 (Jöreskog and Sörbom, 2003) program was conducted. The polychoric correlations matrix and asymptotic covariance matrix were used as input for data analysis. Following Markland's (2007) suggestions of formulating several models if recommended by data (see tables 1 and 2), three different models were hypothesized. Model 1 would follow the item-factor distribution from the original instrument; model 2 would dismiss the items whose removal would improve the α value (items 1, 15, 17 and 21); and model 3 would consider items 2, 11, 16, 17, 18, 14, 20, 4, 10, 19, 9, 11, 5 and 6. The former was a five-factor model (figure 1) in line with other works in the sport field (Li & Harmer, 1996;Ntoumanis, 2001) or FLL domain, such as Ardasheva et al. (2012) instrument, which contained a unique factor for IM.
Most of the items presented individual reliability (R 2 ) values >.5 in the three models, being the lowest value in item 16 for the model 1 (R 2 =.431), and in item 2 for the model 2 (R 2 =.431). Table 3 displays the three model goodness of fit. Nevertheless, only in the fivefactor model the x 2 df value was <5.00, confirming this model fit (Hu & Bentler, 1999). Attending each model cross-validation index (ECVI) and the Akaike information criterion -which apart from the fit, informs about the model parsimony-, the fivefactor model performed the best fit as its values were the lowest.

Figure 1
Path diagram of the CFA, with standardized weights and measurement errors of each one of the items in the Spanish LLOS-IEA version for CLIL students. Table 4 shows the model 3 reliability and validity. Regarding model 2, expect for the x 2 df data, it also performed a good fit therefore its results are also presented as they might be taken into consideration for future studies.

Reliability and validity
Factors 3 and 4 in model 3 could be questioned due to its α <.70, nevertheless, as a consequence of the limited items per factor (2), these values would be acceptable (Taylor et al., 2008). Moreover, the composite reliability is considered more suitable than the Cronbach's alpha, as it does not depend on the number of attributes associated to each concept (Vandenbosch, 1996); therefore, factors 3 and 4 in model 3 would show a positive reliability.

Convergent validity
Regarding convergent validity, some authors such as Bollen (1989) assure that the index validity can be estimated from the magnitude of factor loadings in addition to the adjusted goodness of fit index (AGFI; table 5). In addition to this argument, as stated before, all the factor saturations were statistically significant (t-value >1.96), hence, all the indices assessed the same theoretical construct. Finally, it is worth mentioning that the factor loading for all the items in model 3 were high (R 2 >.50).

Construct validity
At last, a multi-level model analysis was performed. Several models were tested considering province, school, and grade, finally determining the model by school and grade as it got the best BIC (10224.732). Table 6 displays the mixed model multi-level analysis outcome. The estimated mean values by sex and age (grouped in school cycles) adjusted to school and grade are presented. The student stipulated age for 1st cycle is 12-13 for, 14-15 for 2nd cycle and 16 for 1st bachillerato. This table also shows the standard error, the 95% confidence interval, and the statistical test corresponding to the model where the hypothesis of equal means in the dimensions between the independent variable categories is contrasted.
This table also includes the difference between answer and reference categories, and the p-value associated to the statistical tests of margin corrected means comparison by multiple comparisons through SIDAK. Concerning the AM dimension, the only differences were observed in the student sex (p<.000), being higher in boys (M=1.940; SE=.067, adjusted difference=.310) with a very important F value (F=56.60).
With regard to the EM-external regulation, there were differences by sex (p<.000) and age (p<.000) being respectively higher in girls and 1 st cycle students.
In the variable EM-introjected regulation, no significant differences were found. Finally, important differences were found by sex (F=65.73; p<.000), and age (F=65.73; p<.000) for IM, being the higher mean value in first cycle students (M=4.943; SE=.029).

Sex Invariance
In order to analyze the factorial invariance, Abalo et al., (2006) recommendations were followed estimating the same model for both samples. No significant differences were found in χ 2 between models, rejecting the H0 and accepting the invariance. However, due to χ 2 sensitiveness to sample size, Cheung and Rensvold (2002) criteria regarding the ΔCFI were also implemented. According to these authors ΔCFI values ≤ .01 indicate that the null hypothesis should not be rejected, being ΔCFI=.005 in the present study. Finally, the rest of the results state that measurement properties remain sex-invariant.

Discussion
The main objective in this work has been to study and validate the LLOS-IEA to a Spanish speaking context. Taking the study of Fernández-Barrionuevo and Baena-Extremera, 2018 as a starting point, this instrument validation will allow us to obtain information about the student FL motivation in CLIL and regular schools. This fact is of great interest due to the expansion of the Spanish language with a total of about 560 million speakers all over the world.
After a first internal structure analysis, a 7 factor-dimensional structure similar to the original instrument (Noels et al., 2000) was to account for 70.432% of the variance. Nevertheless, several factors presented some issues worth considering. First, the IMaccomplishment and IM-knowledge merged in a unique dimension, and second, items 1 and 2 were grouped in one only dimension situated outside the theoretical model. Because of this, a second internal structure analysis was performed indicating no fixed factors to extract, yielding 5 dimensions, in which all the items were placed in their corresponding dimension accounting for 62.287% of the total variance.
As be observed in table 1, the removal of items 1, 15, 17 and 21 would improve Cronbach's alpha values in their respective factors. This was taken into consideration by conducting a 7-factor CFA without these items (model 2) compared to the structure of the original 7-factor instrument (model 1) and a 5-factor version (model 3). Regarding these proposals, the lowest ECVI y Akaike values were in model 3, which indicated the best fitting. Moreover, even though the model 2 x 2 df value was close to 5, only the model 3 presented-as recommended-a value under 5.
Although composite reliability and AVE values are above the minimum required in models 2 and 3 (Hair et al., 2009), the former not only shows the best fit, but also better convergent validity. However, model 2 should also be considered for future research as its CFA and its validity and reliability indices are close to acceptable.
Even though we have not found any study based on a LLOS-IEA 5-factor model, there are some other instruments also found on the EME (Vallerand et al., 1989) with this factor structure such as the SMS Spanish 5-factor version (Duda, 2007;Granero-Gallegos et al., 2014). Moreover, as regards the FL, the scale designed by Ardasheva et al. (2012) shares some features with our instrument, for instance, the use of a unique factor for IM and the contextualization in pre-university bilingual students.
In relation to the construct validity, except for the EM-introjected regulation, all the motivational variables were significantly higher for girls, while AM was significantly higher for boys. These results are in line with other studies that also indicate higher IM values in schoolgirls (Kissau, 2006;Okuniewski, 2014;Williams et al., 2002). The cause of these sex differences could be in the feminized bias of the FL school domain (Williams et al., 2002). This idea is backed up by the McCall's (2011) study where the combination of FLL with one interesting topic for boys such as football, improved their IM.
Regarding the higher FL motivation in girls, Williams et al. (2002) stated that "comments elicited within individual interviews suggest that this [the positive attitude towards FLL] might be part of a more general orientation toward schoolwork rather than necessary relating only to learning languages" (p. 522). Therefore, another cause of the higher FL motivation in girls could be caused by a higher academic motivation in general. Based on Vallerand (1997) Hierarchical Model, motivation in a certain context, such as the school, could have an effect on another context, for instance the FLL.
To conclude, results indicate that the 5-factor Spanish version of LLOS-IEA adapted for CLIL schools is valid, reliable and a sex-invariant instrument. However, it would be advisable to test the 7-factor model in future investigations. This instrument will open new fields of research by allowing us to obtain motivational information in CLIL and FLL Spanish speaking students that could not be accessed before. In this way, investigations will be able to deepen in one of the most important FLL underlying factors-student's motivation, facing the study "through" a FL as a specific domain. In addition, teachers will have a valid tool to measure student's motivation when learning a/through a FL.