Coding – Language Maps, Language Clouds

Building our Site

Michelle Prizzi Blog, Coding, Language, Microsoft Excel, Project, WordPress October 22, 2016Blog, Coding, Data, Language, Microsoft Excel, WordPress 0 Comment

When deciding what pages to include in our Menu, I had to really think about what pages are on regular websites. I decided that Our Mission Statement should be a our homepage so that when you arrive at our site, you know about our project and our goals. I revised the mission statement several times and finally decided upon the finished product you see now.

My second thought was having a page explaining what exactly we mean by Language Maps and Language Clouds. Dr. Quizon thankfully authored this page with working links.

As a team, we decided to rename the blog page to “The Project”. This was a unanimous decision. We wanted to take people step by step through our process.

Our “Contact Us” page is for anyone who has questions, comments, or wants to use our research which is covered by Creative Commons. The “Contribute” page will be an open forum for anyone who would like to add their languages to our research. We are working now with a WordPress expert who is going to build our questionnaire which will input directly into an Microsoft Excel spread sheet, already coded.

We encourage you to check back soon and contribute your own languages!

Using Viewshare

Michelle Prizzi Coding, Data, Language, Language Map, Locations, Microsoft Excel, Participants, Viewshare October 22, 2016Coding, Data, Language, Language Maps, Microsoft Excel, Viewshare 0 Comment

ViewShare is a website in which people can input a selection of data, like an excel chart, and the program will allow the person to create different charts, maps, lists and timelines, depending on what kind of information the program can read from the data. Professor Quizon set up the account and I, Ellie Hautz, explored it’s features with a mock Microsoft Excel spreadsheet to see how we could use ViewShare in our research. We all worked on coding the information onto the Microsoft Excel file. I took the final version and uploaded it to ViewShare to see what I could do with it. I was so excited to see the amount of charts I could make with it. I looked at the map portion and it had plotted points that I had not intended it to. For instance our version of New York was specifying New York, NY. However, the program read it as New York in the United Kingdom. So I thought that maybe putting in coordinates would plot easily. I then had a discussion with the group to decided what coordinates we were going to use. We decided that for everyone using North New Jersey English would be based on Bergen County and South New Jersey English would be Cape May County since they were the most coastal north and the most coastal south.The majority of our participants were from New Jersey. However, a good deal of participants indicated other states our countries. For plotting these, we decided to use the capital of the state or country origin of the language unless otherwise specified by the participant. So I made an extra section of my own Microsoft Excel spreadsheet with the coordinates for these areas, however, it still was not working properly. I looked closely and the program asked for the city and state and/or country of each data point. So I went through again and used the capital of every county, state, and country. Finally it worked and the map plotted correctly.

Unfortunately, after a week or so of the corrected data, ViewShare stopped being compatible with our first set of data.

Place names: New Jersey/New York to global speech communities

Cherubim Quizon Coding, Data, Language, Language Map, Locations, Participants, Viewshare October 22, 2016October 22, 2016Coding, Data, Language, Language Maps, Latitude, Longitude, North, Participants, South, Viewshare 0 Comment

Our team member Ellie Hautz was tasked with figuring out how to map place names that were provided as regional descriptors for a language. In a sense, individuals were identifying where a particular speech community lives, either as a result of their own lived experience or inferred from information shared by others. The map that this query generates is quite distinct: using Viewshare we see a wider and richer distribution, encompassing both the perceived geographical origins of a language but also the location of speech communities as witnessed or inferred by our respondents.

One of the most interesting decisions made by the team was how to capture the high incidence of descriptors for New Jersey and New York varieties of English. Viewshare required specific latitude/longitude codes in order to generate a map. As a New Jersey-based university, our team felt obliged to step up to the richness of the local data before us. After exploring various strategies, we decided to adapt principles used in Rick Aschmann’s American Dialects website. Like Aschmann’s site as well as the broader literature on English dispersal in the US, we used the Eastern seaboard as starting point. However, to graphically capture what our respondents refer to as “northern NJ,” we decided to map it onto New Jersey’s northernmost seaside county (Bergen) with the capital city of Hackensack. Similarly, verbatim descriptors of “southern NJ” were mapped onto the southernmost seaside county (Cape May) with the same-named capital city. Aschmann used slightly different terms, however, because he was plotting nationally beyond a single state. For purposes of this specific data set, what appears elsewhere as “Inland North” was coded as “Northern NJ”; what is referred to elsewhere as Atlantic Midland is what was coded as “Southern NJ.” Whenever New York was mentioned, the variety descriptor “Greater New York City” was used.

Interestingly, our college campus is located in Essex County, NJ located in a geographical region that falls somewhere at the cusp of the language varieties of Northern NJ/Inland North as well as Greater New York City.

Microsoft Excel: Randomized Number I.D. for Participants

Anastasia Bushey Coding, Data, excel, Formula, I.D., Microsoft Excel, Participants, Project, Randomized I.D. October 18, 2016October 27, 2016Coding, Data, Formula, Microsoft Excel, Participant, Randomized I.D. 1 Comment

In order to present our data by the participant, the ethical thing was to avoid revealing the actual name of the individuals who gave us our data. In this, we used Microsoft Excel to generate and assign random numbers, rather than simply numbering every subject individually. These numbers would then act as the I.D.’s for each participant. On a separate spreadsheet, we put participants first and last names in columns ‘A’ and ‘B’ respectively (here I have put in ten fake names* to show you an example). For our data, we had a list of all the participants’ names in alphabetical order by last name.

random-number-excel-part-3

Next, I used the RAND, or random function. By putting the =RAND() function into column ‘C’ from cells C1 to C10, we were given a random decimal number. Then, I had tried to use the =RANDBETWEEN function in column ‘D’, inputting =RANDBETWEEN(1,10). Although this gave us a random whole number between 1 and 10, there were repeats of the same number. So now one of the biggest problems was finding a way to have excel create random intergers that did NOT repeat.

random-number-excel-part-2

Finally, with a little help from the library and the internet, I used the following formula to generate NON-REPEATING whole numbers in column ‘D’;
=MATCH(LARGE($C$1:$C$20,ROW()),$C$1:$C$20,0)

The result was what we were looking for, anonymity for our participants. With this success, we copied and pasted the numbers next to the names in the list of participants in our data set.**

random-number-excel-part-1

*None of these names are meant to have any relation to any person(s) alive or deceased.
**When I input the function into column ‘D’, the random values in column ‘C’ changed automatically, but remained random. you need to keep this formula in this column in order for the function in ‘D’ to work.

There may be other ways of achieving the same outcome, but this formula worked best in excel.

The Coding Process

Michelle Prizzi Coding, Language July 1, 2016October 18, 2016Coding, Data, Language, Process 0 Comment

As of July, our research interns officially began coding the data extracted from the note cards.

The first step was moving all the raw data information from the individual note cards to an Excel spreadsheet. Once we finished transcribing the data verbatim from the cards, we noticed that the individual descriptors on each card would make coding the spreadsheet difficult. What undoubtedly made the cards unique, also made them so versatile that coming up with a coding system would be an ambitious task. We wanted to keep the authenticity of the raw data while also coding the entries in an easily understood manner, making it significantly easier for us to plot.

[box] Here are the unique descriptors that students wrote on their note cards. [/box]

Before establishing the coding system, we had to answer some questions: If the language is Spanish, but the card identifies The Dominican Republic or Puerto Rico, do we code that region as Spain or as the other two countries? What can we assume from the cards if we can assume anything? Since each research intern takes part in all steps of the process, establishing a concise coding system is essential so that every card is coded the same way.

Generally, the beginning of our research meetings are spent discussing any coding problems that come up. We are currently still coding the First Data Set and have started coding the Second Data Set.