Comparative Evaluation of Usability between QWERTY-Based Arabic and Non-QWERTY-Based Arabic Keyboard Layout: Empirical Evidence

. QWERTY-based Arabic keyboard layouts have been in existence in order to assist QWERTY users in Arabic typing. However, there is lack of empirical evidence presenting the comparative usability of this layout and the common non-QWERTY-based Arabic keyboard layout. Our study focuses on providing this evidence by examining the usability of a QWERTY-based Arabic keyboard layout (QB) and the common non-QWERTY-based Arabic keyboard layout (NQB) from the perspective of QWERTY users, and comparing the evaluation results between the two layouts. After conducting experiments using within-subjects and between-subjects designs, the results showed that QB was significantly better in efficiency and learnability than NQB. QB also enabled more effective typing in almost all experiment designs. An exception was observed in one between-subjects study and analyzed. For the overall usability, most participants subjectively preferred QB to NQB.


Introduction
Arabic is the official language of at least 22 states [1] [2], the fifth [3] or sixth [4] most spoken language in the world, and one of the six official languages of the United Nations [5]. It is also the liturgical language of over 1.6 billion Muslims [6] [7] because the Qur'an and Hadith as the primary source of Islamic teachings were written in Arabic. All these facts indicate the importance and wide use of Arabic writing for communication, learning of Islamic teachings, and development of many Islamic subdisciplines. Consequently, the text writing in Arabic takes place not only in countries of its native speakers but also in areas where Islamic teachings are being studied, practiced, and developed.
One of the technologies to support text writing, including that of Arabic, is a computer keyboard. It is one of the primary input devices for a computer which uses an arrangement of buttons or keys, usually modelled after the typewriter keyboard. An important aspect of keyboards is the keyboard layout, that is any specific arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard [8]. This arrangement can be mechanical/physical [9][10], visual [9], or functional/logical [9] [11]. The mechanical/physical layout is the placements of keys of a keyboard [9] [10]. The visual layout shows the arrangement of the legends (labels, markings, or engravings) that appear on the keys of a keyboard [9]. The large number of formal and informal Islamic schools where Arabic is being taught [22] [25].
Taking Indonesia as a study case and Indonesians as participants is considered appropriate for some main reasons. First, the Indonesian language uses Latin alphabets and most Indonesians are more familiar with QWERTY keyboard layout than with any other kind of layouts. This matches our motivation to examine the usability of keyboard layouts from the perspective of users who are already familiar with QWERTY. Second, as Arabic is widely used in the development and practice of Islamic studies, the need of typing in Arabic is also high in Indonesia. An interview with several Arabic and Islamic studies teachers and practitioners revealed some indication of difficulty in using the NQB. This motivates our study to provide quantitative evidence of usability comparison between NQB and QB that may suggest the future use or further analysis of the layout under evaluation.
2 Theoretical Background

The Common Arabic Keyboard Layout
The origin of Arabic keyboard layout has been linked with the invention of the first Arabic layout for typewriters in 1899 [26]. This layout had been the basis of over 20 variants built by computer companies in the 1970s and 1980s [27] [28]. Although, the Arab Standardization and Metrology Organization (ASMO) developed a standard for the Arabic keyboard layout to anticipate the risks created by the existence of too many variants, one of the variants, Arabic Microsoft/IBM PC layout, was already widely accepted and adopted by the market [27]. This layout is still widely used today and as such we call it the common Arabic keyboard layout.  [29] Fig. 1 shows the common Arabic keyboard layout on IBM PC/Windows standard 101 layout, which includes main letters and diacritics. This layout is available as an input method in Windows operating system. When used on a physical QWERTYbased keyboard, the input method must be set accordingly to Arabic 101. Table 1 shows the map of the Arabic main characters in alphabetic order started with the diacritics, their corresponding Unicode, and the Latin keys on QWERTY keyboard. This common Arabic layout was not specifically designed for performance on computer use. It is also assumed to bring potential difficulty for QWERTY users to learn. For ease of reference in comparing it with the QWERTY-based Arabic keyboard layout, this common Arabic keyboard layout is also referred to as NQB (non-QWERTY based layout) in our study.

QWERTY-Based Arabic Keyboard Layouts
There have been several designs of QWERTY-based Arabic keyboard layouts that map Arabic characters to their phonetically close Latin keys, e.g. [14], [15], [16], [17], [18] and [19]. Given their various dates of releases, some of them may not be related to the others. An explicit association comes from Al Zabir layout designer [19] mentioning that they sought to improve the previous design by Intellaren (Intellark [14]). For the sake of brevity, in this study the family of QWERTY-based Arabic layouts is referred to as QB.
Among the releases of QB variants Intellark has provided the most comprehensive coverage on its background, design, and guidelines. Intellaren as its designer has discovered several obstacles in using NQB, such as (1) same phonetic sound located on different key; (2) same letter family placed on many different locations; (3) shapeand sound-related letters far apart from each other; and so forth [30]. To overcome these obstacles, Intellark considers several factors in its elaborate design process, including phonetic similarity, shape similarity, Arabic alphabetic ordering within letter blocks, and frequency distribution of Arabic letters [14]. The design of Intellark on QWERTY layout showing Arabic and Latin main letters is presented in Fig. 2 Intellark uses one-to-many mapping that maps one or more Arabic characters to each Latin key on a QB. In cases that more than one Arabic character are mapped to one Latin key, the character produced at one time is a function of the number of key presses and key timing. Pressing the key once produces a certain character and pressing it a certain number of times rapidly within time tolerance (a fraction of a second) produce another character of lesser frequency related to the main key character. Intellark gives priority to characters of high frequency from a given frequency analysis result when mapping English keys to Arabic characters [30]. Characters of higher frequency need smaller number of key presses to print and vice versa. Intellark also uses Shift key to access characters of low frequency faster. Table  2 shows the mapping from Arabic characters, to their corresponding Latin keys, and to the number key presses in both unshifted and shifted conditions.

Usability evaluation of keyboard layouts
ISO 9241:210 defines usability as "extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use" [32]. In this case, effectiveness is concerned with accuracy and completeness with which users achieve specified goals, whereas  efficiency deals with the resources expended in relation to that effectiveness and satisfaction refers to freedom from discomfort and positive attitudes towards the use of the product [31] [32]. Nielsen's definition of usability [33] includes different but overlapping elements, i.e. learnability, efficiency, memorability, errors, and satisfaction. However, both ISO 9241's and Nielsen's concept of usability covers the same overall area of concerns and share the same contextual application. The choice of usability measurers for each usability aspect (effectiveness, efficiency, and satisfaction) depends on the specified usability objectives, product requirements, and organization needs [31] [34]. Although usability is not always explicitly mentioned, common measures used in previous work on keyboard evaluations are related to usability aspects. Some examples are accuracy [35] [41]; learning time [40]; familiarity [42]; likability [35], comfort [35], and learnability [35]; and other more complex measurements in [42] and [43].
In this study we adopted effectiveness and efficiency from ISO 9241 [32] and learnability from Nielsen [33] as the main usability elements for evaluating the keyboard layouts because they are relevant to keyboard layouts and feasible to apply. We included accuracy in effectiveness, typing speed in efficiency and independence from user manuals in learnability as measurements. Both objective and subjective data were taken from the participants. We learned that similar measurements were successfully employed for comparing several keyboard layouts, such as in [35] for accuracy and typing speed of linear QWERTY keyboard layouts, in [36] for relative error rate of a smartwatch keyboard layout, and in [40] for typing speed, error rate, and learning time of Bangla keyboard layouts. However, the different contexts and limitations of resources caused some variations in the detailed formula and application of those measurements. Further details about the measurement in this study are discussed in the next section.

Evaluation Design
The goal of the whole evaluation is to compare the usability of QB and that of NQB. As discussed earlier that QB was proposed to possibly overcome the difficulty in using NQB, we offer a general hypothesis as follows: H1. The usability of QWERTY-based Arabic keyboard layout (QB) is higher than that of non-QWERTY-based Arabic keyboard layout (NQB).
The concept of usability applied in this evaluation is defined as consisting of three subconcepts, namely effectiveness, efficiency, and learnability. Therefore, H1 can be further elaborated into several hypotheses as follows: H1a. QB is more effective than NQB. H0a. There is no difference in effectiveness between QB and NQB.
H1b. QB is more efficient than NQB. H0b. There is no difference in efficiency between QB and NQB.
H1c. QB is more learnable than NQB. H0c. There is no difference in learnability between QB and NQB.
Experimental study and usability testing were the main strategy and method respectively employed in this evaluation. An experiment was conducted to test those hypotheses and it consisted of several usability tests. The experiment used both between-subjects and within-subjects design. The participants were split into two groups, A and B. Each group performed in two different conditions corresponding to two different types of keyboard layouts, QB and NQB. The sequence of conditions for each group was different. In group A the participants performed usability testing on NQB in the first session and on QB in the second session. Conversely, in group B the participants conducted the test on QB in the first session and on NQB in the second session. The experiment groups and their sessions are shown in Table 3. Table 3. Experiment grouping Session 1 and 2 were conducted on different days. In each session both objective and subjective usability data were recorded. The objective data is related to measured effectiveness and efficiency of using the tested layout, whereas the subjective data is about perception on learnability of using the layout. After completing all two sessions, once again subjective data from participants were taken, but at this time it included all the three aspects of usability evaluation, namely effectiveness, efficiency, and learnability. The objective and subjective data are complementary to each other in our analysis.
With abovementioned scenario, the between-subjects design was intended to obtain the usability comparison between the two layouts both from (1) the perspective of new users with negligible effect of previous learning and (2) the perspective of users who have shortly learned the other layout from the previous test. To achieve the first intention, the usability of each layout was compared with each other by taking the result data from the first session of each group, i.e. NQB test in group A versus QB test in group B. For the second intention, the usability of each layout is compared with each other by taking the result data from the second session of each group, i.e. QB test in group A versus NQB test in group B. The hypotheses would be tested based on the result of NQB test and QB test for each group.
On the other hand the within-subjects design was used to obtain the usability comparison between the two layouts within each group given the possible learning effect from previous test. In group A, the result data of NQB test of session 1 was compared with that of QB test of session 2. Similarly, in group B the result data of QB test of session1 was compared with that of NQB test of session 2. The hypotheses would be tested based on the test result data within each group. This experiment design is shown in Table 4.

Participants
Sixty participants were recruited from our university for this study. Their ages ranged from 19 to 25 years. None of them reported any musculoskeletal problems with their hands. All of them had normal clarity of vision or had it corrected to normal with eyeglasses or contact lenses. Since they are Indonesians, all of them are QWERTY layout users. Their familiarity with Arabic characters and diacritics and their ability to read Arabic sentences was confirmed through a preliminary test. Most participants (44 of 60) had never typed in Arabic and the remaining had only done it on a trial basis.
In this study all participants were split into two groups with equal number of members, i.e. 30 participants in each group.

Procedure and Instruments
The two keyboard layouts, NQB and QB were used in usability testing on a laboratory. IBM PC Arabic 101 was taken to represent NQB and Intellark was taken as an instance of QB. All participants of the same group performed usability testing at the same time in the same laboratory room. They sat on chairs of the same type and used desktop personal computers with equal specifications and the same physical QWERTY keyboard layouts without any Arabic labels, markings, or engravings on the keys (IBM PC/Windows 104 standard US layout). The computers were set on the tables using the same standard. The typing test for NQB used Microsoft Words and for QB used Intellark's online typing application. Each participant attended a session in one day and followed a total of two sessions for two days in a laboratory. In each session they tested a different keyboard type according to the group allocation, as in Table 3. Each session took approximately 60 minutes to complete. When participants came and sat on the chairs, they were given user manuals containing description and technical details about the corresponding keyboard layout, including QWERTY-to-Arabic mapping tables and diagrams. After receiving information about the test protocol, they had a 15-minute opportunity to try the keyboard layout. Once completed the trial, they were asked to type Arabic sentences based on those given in the test material as accurately as possible within 30 minutes. The test material contained Arabic sentences taken from various verses in the Qur'an. These sentences were selected in such a way that every primary Arabic alphabetic character and diacritic was included. The test protocol strictly forbade the use of cut, copy, and paste functions and the operation of on-screen keyboards during typing test. However, the participants were allowed to consult the user manual at any time. In the end, if a participant could finish all the material before 30 minutes, its stop time would be recorded and reported to the invigilator. In this case the test duration was less than 30 minutes. Otherwise, the test would stop in 30 minutes. Having completed the typing test of a session, every participant was asked to answer a post-session questionnaire. The questionnaire asked them to rate the accuracy (effectiveness), typing speed (efficiency), and independence from user manual (learnability) with which the corresponding layout could be used. After completing all sessions, the participants were asked to answer a final post-test questionnaire on overall usability of the keyboard layouts tested.

Outcome measurements
The usability of two different keyboard layouts is the main feature being evaluated in this study. We selected the relevant characteristics of usability to the most widely referenced quality in typing. These are effectiveness, efficiency, and learnability. They were further related the more specific concepts and operationalized into practical measurements.

Effectiveness
In the context of typing, the effectiveness of using a keyboard layout is logically related to the accuracy of typing in using the layout. The accuracy of typing is basically the proportion of correct typing results among the total typing results. Therefore, in this study the accuracy of typing (Acc) is computed as the number of correct characters typed (CCT) divided by the number of characters typed (CT), as written in Eq (1). Characters include alphanumeric symbols, spaces, and diacritics in Arabic. The accuracy of typing can also be represented in percentage.
The accuracy of typing was calculated for every participant in all sessions. The data would be used to describe the participants' performance in every session and to test H0a hypothesis.
In addition to objective data, we also collected subjective data about accuracy, which reflects the relative perceived accuracy of typing. This subjective data was taken from a questionnaire after all sessions had been completed. The question was "Which keyboard layout can help typing in Arabic more acccurately?" and the partipants would choose either "IBM PC Arabic (NQB)" or "Intellark (QB)".

Efficiency
In the context of typing, the efficiency of using a keyboard layout is strongly related to the speed of typing when using the layout. The speed of typing is normally represented as CPM (characters per minutes) or WPM (words per minutes). To avoid the ambiguity in determining words, the CPM was selected in this study. In terms of error made during typing, CPM can be calculated to include uncorrected errors in the characters typed, or conversely only include the corrected errors. However, since typing speed in our context is related to efficiency, it makes more sense to choose the CPM that only includes the corrected errors. Thus, the speed of typing (Vt, in CPM) is computed as the number of correct characters typed (CCT) divided by the typing time (t, in minutes), as written in Eq (2).
In our experiment, the typing time would be equal to 30 minutes or less because there was a fixed maximum of typing time and the participant might finish typing earlier than that time. The time spent by participants to correct errors or having brief breaks is considered natural in the typing process and therefore was counted.
The subjective data about efficiency, which reflects the relative perceived efficiency of typing, was taken from a final post-test questionnaire. The question was "Which keyboard layout can help typing in Arabic more efficiently?" and the participants would choose either "IBM PC Arabic (NQB)" or "Intellark (QB)".

Learnability
Learnability is basically the extent to which something can be learned [44]. It is considered as a category of usability principles [13], a subscale of usability [45], a subcharacteristic of usability [34], one of usability objectives [31] or an aspect of use [34]. In its full scale, learnability is measured in a longitudinal study to obtain how much time and effort are required to become proficient with something [44]. However, in this study, we intended to discover the learnability in a much shorter period as the user encountered the keyboard.
Data about learnability was taken from subjective evaluation of the layouts. The first evaluation relates learnability with the user's relative independence from using the user manuals. As a main input device, keyboard layout design should be intuitive and easy to learn without the user consulting the user manuals too frequently. The evaluation data was taken from a questionnaire given after every session. This questionnaire adapted that in [40] because it served a similar purpose. It asked the participants if during the test they (1) checked the user manual every time they typed an Arabic character, (2) did not always check the user manual but did it more than 50% of the time, (4) still checked the user manual less than 50% of the time, or (5) never checked the user manual.
The second subjective data about learnability reflects the relative perceived learnability of typing with the corresponding keyboard layout. It was taken from a post-experiment questionnaire asking "Which keyboard layout can help typing in Arabic more efficiently?" and the participants would choose either "IBM PC Arabic (NQB)" or "Intellark (QB)".

Results and Discussions
The results and discussions of effectiveness, efficiency, and learnability of the keyboard layouts being evaluated as well as their overall usability are described below. Table 5 shows the typing accuracy and typing speed, leading to keyboard layout's effectiveness and efficiency resprectively, for every participant in every session, while Table 6 shows the result of hypothesis testing. After using paired t-test as suggested in [46] for within-subjects study, the hypothesis testing based on both within group A and within group B resulted in rejection of null hypothesis H0a (p = 6.957e-5 within group A and p = 0.011 within group B; in each group: p < 0.05). This implies that the use of QB could produce significantly more accurate results than that of NQB. The hypothesis H1a is then supported. In within-subjects study, there is possibility of "carryover effects", where performance in one condition impacts performance in another condition, e.g. as a result of practice or fatigue [44]. Carryover effects may have existed in our study but this issue needs to be investigated further. They may have contributed to the large difference of t-values between hypothesis test results for the two different groups. Whatever the carryover effects may have been, both hypothesis tests of within-subject studies show that the result of QB test is significantly more accurate than that of NQB.  For between-subjects studies, having applied two-sample t-test as recommended in [46], the use of NQB in group A and QB in group B, from session 1 tests of each group, did not result in significant difference in typing accuracy. In this case p-value is equal to 0.292, which is higher than 0.05. The H0a is supported and H1a is therefore rejected. However, using the same hypothesis testing method in the second between-subjects study, from the results of session 2 tests of each group, it is evident that typing with QB gave a significantly more accurate result than typing with NQB. p-value is equal to 0.0002, which is less than 0.05. In this case, H0a is rejected and H1a is supported, the same result as the ones of within-subjects studies. These two opposing findings from different sessions shows that although QB has been believed to be better than NQB, the first 45-minute interaction between participants and their corresponding keyboard layout was unable to show the superiority of QB in producing accuracy to its counterpart NQB. It could be that the interaction time taken in session 1 was not enough for participants to show effective performance in the beginning of using the layout. On the other hand, in session 2 QB produced higher accuracy than NQB. Given the previous unfamiliarity of participants with Arabic keyboard layouts and their new or infrequent use of Arabic typing, there may be a carryover effect from the session 1 that helped them become rather familiar with the keyboard layout and Arabic typing. If QB is indeed better than NQB, this carryover effect may give advantage for those moving from typing on NQB in session 1 to typing on QB in session 2. They may have felt like jumping from a more difficult  situation to an easier one, while equipped with some learning experience. As for those who transferred from QB to NQB, they also had some learning experience from session 1. However, this carryover experience possibly could not help them much to make their performance better in session 2 because then they must use NQB which is possibly less usable than QB. This theory needs to be confirmed by further research. Table 6. Results of hypothesis testing in effectiveness, efficiency, and learnability

Efficiency
From the pattern of typing speed in Table 5 it is evident that the use of QB tends to produce higher speed performance than the use of NQB. The hypothesis testing confirms this in both within-subjects studies and between-subject studies, as seen in Table 6. After using paired t-test for within-subjects study, it is known that p = 1.522e-14 for within group A and p = 4.152e-08 for within group B. As p < 0.05 for each group, the null hypothesis H0b is rejected and the alternative one H1b is supported in both cases. It is concluded that the typing speed resulted from using QB is significantly higher than that from using NQB.
The hypothesis testing for typing speed in between-subjects studies also reveals similar results. Having applied two-sample t-test, it is found that p = 3.779e-09 for cross-group evaluation in session 1 and p = 2.597e-06 for cross-group evaluation in session 2. As p < 0.05 for both sessions, the null hypothesis H0b is rejected and H1b is supported. In both cases, the typing speed resulted from using QB is significantly higher than that from using NQB.
There is an interesting fact when between-subjects study of accuracy is discussed in relation to between-subjects study of efficiency in the same session of usability testing. In session 2, both the typing accuracy and typing speed of QB use are higher than those of NQB use. Thus, in that case the QB is more usable in both aspects than NQB. Meanwhile, in session 1, the typing speed of QB use is higher than that of NQB, but the accuracy of QB use is lower than that of NQB use. This shows that participants' familiarity with QWERTY let them type more quickly with QB but did not automatically result in higher accuracy with that layout. This is possibly because they were previously new or infrequent users of Arabic typing and in the evaluation session 1 they were suddenly asked to type in Arabic intensively in a short period. This situation was disadvantageous for getting relatively high accuracy. The inconsistency between accuracy and typing speed, however, did not appear in session 2 because of possible carryover effects from session 1 as described in previous section. Table 7 displays the frequency distribution of participants' selected answers to the questionnaire about the independence of users from using user manuals. Table 7. Independence of users from using user manuals In session 1 there are participants who always checked the user manual every time they typed an Arabic character. This number is higher for NQB participants (30%, n = 9) than that for QB participants (7%, n = 2). In fact, in this session almost all NQB participants (90%, n = 27) perceived that they used more than 50% of their typing time checking the user manuals. Only 5 NQB participants (17%) checked the manuals less than 50% of the time and none performed the tests without checking the manuals. This is in contrast with QB participants in that session where most of them (90%, n = 27) checked the user manuals less than 50% of the time and only 3 participants (10%) checked the manuals more than 50% of the time. In other words, QB participants were more independent from user manuals than NQB participants while they are typing with the respective keyboard layout.

Learnability
The results of session 2 are similar to that of session 1. During the typing test, almost all QB participants in session 2 (93%, n = 28) checked the user manuals less than 50% of the time. The remaining are 2 participants who even never consulted the manuals. On the other hand, most NQB participants (87%, n = 26) checked the manual more than 50% of the time. Eleven of them (37%) even checked the manuals every time they typed a character. Only 4 NQB participants consulted the manuals less than 50% of the time and none could type without looking in the manuals. In conclusion, NQB usage requires participants to check user manuals more frequently than QB usage.
After using two-sample t-test in the hypothesis testing in between-subjects studies for independence from user manuals, it is found that p = 8.305e-08 for cross-group evaluation in session 1 and p = 5.960e-14 for cross-group evaluation in session 2. As p < 0.05 for both sessions, the null hypothesis H0c is rejected and H1c is supported. The independence of QB participants from using manuals when typing is significantly higher than that of QB participants.  Table 8 shows the perceptions of all participants on typing accuracy, speed, and learnability for both NQB and QB taken from the final post-test questionnaire. The majority of participants preferred QB to NQB in all three aspects. However, a slight difference in number is shown in accuracy aspect in one of the groups. Almost all participants in group A (90%, n = 27) chose QB over NQB for its better support of accuracy and this is aligned with the H0a hypothesis testing in previous withinsubjects studies using objectively measured accuracy in that group. In group B, however, the number of QB voters (57%, n = 17) is only slightly higher than that of NQB voters (43%, n = 13). This is an interesting fact because based on the previous objectively measured accuracy in group B, QB produced significantly higher in typing accuracy than NQB. Besides, as shown in Table 8 almost all participants (73%, n = 22) have accuracy results on QB higher than that on NQB. This means that although objectively the accuracy from QB usage is significantly higher than that of NQB, subjectively the number of participants perceiving that QB can help them typing Arabic more accurately than NQB is not relatively too high (57%). This fact is not found in group A and further research is expected to explain the reasons behind this phenomena. The perceived typing speed and learnability that QB can support are higher than that of NQB. In group A all 30 participants preferred QB to NB in terms of typing speed support, while in group B only one participant chose NQB over QB. This high preference for QB is consistent with the result of H0b hypothesis tesing. As for learnability, almost all participants (97%, n = 29) voted QB over NQB in group A and all participants preferred QB to NQB in group B. This is also aligned with the result of H0c hypothesis testing.

Limitation and further work
In evaluating NQB and QB in this study, we only selected one instance for each layout, namely Arabic Windows/IBM PC and Intellark respectively. There are other types of NQB and QB that can also be subject to evaluation. Arabic Mac running on widely used Mac OS may be included in the next evaluation. Similarly, other QB variants can also be evaluated because every QB variant has its own characteristics, merits, and potential problems to solve, other than their general QWERTY-based commonality.
Another limitation is the objective of usability evaluation in this study. We focused on obtaining empirical evidence of usability comparison between QB and NQB. Further study may be performed to discover and understand the problems of using available QB so that suggestions of improvement can be made. In our study, there were limited explanations on some phenomena during the evaluation. Those include for example the possible reason behind the opposite findings of between- group evaluation in accuracy and the little difference of perception on which layout can help typing more accurately. Further analysis using theories in user experience and longitudinal study of user experience on keyboard use may reveal more phenomena and explain them better. Age factor can affect the performance of using keyboard. Another limitation of our study is the age range of our participants. We understand that the result of our study may not be generalized to other age groups. Therefore, further study may include participants of other age groups who need interactions with Arabic keyboard. In terms of devices, we limited our evaluation for desktop computer usage. The study of NQB and QB usability on mobile devices would also be relevant and useful in this era of mobile computing.

Conclusion
This study aimed to (1) evaluate the usability of QWERTY-based Arabic keyboard layout (QB) and non-QWERTY-based Arabic keyboard layout (NQB) for QWERTY users, and (2) compare the evaluation results between the two layouts. The main hypothesis was that QB is higher in usability than NQB. Usability in this study covered effectiveness, efficiency, and learnability which were related to typing accuracy, typing speed, and independence from user manuals respectively. Thus, the hypothesis became that QB is significantly more effective, more efficient, and easier to learn than NQB. Using both within-subjects and between-subjects experiment designs with a total of 60 participants, almost all of the subhypotheses were supported. In terms of accuracy, the main subhypotheses were partially supported. The use of QB could significantly produce more effective result than that of NQB in the second session and when comparing the results of the first and the second sessions. However, there was no significant difference in effectiveness between QB and NQB test results in the first session. In terms of efficiency and learnability, the corresponding subhypotheses were fully supported. The evidence showed that QB was significantly more efficient to use and easier to learn than NQB. QB can be suggested as an alternative to NQB for QWERTY users, although further studies on its problems and improvements are needed.